Friday, 30 September 2016

Big Data Hadoop FAQ Questions – Sandeep Kanao

Big Data Hadoop Interview Questions – Sandeep Kanao

Big Data - Hadoop Interview Questions – Sandeep Kanao 


What is Big Data?  - Hadoop Interview Questions – Sandeep Kanao


Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.


What is Grid, cloud and cluster - Hadoop FAQ Questions – Sandeep Kanao

Cloud: is simply an aggregate of computing power. You can think of the entire "cloud" as single server, for your purposes. It's conceptually much like an old school mainframe where you could submit your jobs to and have it return the result, except that nowadays the concept is applied more widely. (I.e. not just raw computing, also entire services, or storage ...)

Grid: a grid is simply many computers which together might solve a given problem/crunch data. The fundamental difference between a grid and a cluster is that in a grid each node is relatively independent of others; problems are solved in a divide and conquer fashion.

Cluster: conceptually it is essentially smashing up many machines to make a really big & powerful one. This is a much more difficult architecture than cloud or grid to get right because you have to orchestrate all nodes to work together, and provide consistency of things such as cache, memory, and not to mention clocks. Of course clouds have much the same problem, but unlike clusters clouds are not conceptually one big machine, so the entire architecture doesn't have to treat it as such. You can for instance not allocate the full capacity of your data center to a single request, whereas that is kind of the point of a cluster: to be able to throw 100% of the oomph at a single problem.


What are the examples of  Big Data? - Hadoop Interview Questions – Sandeep Kanao


Black Box Data: It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.

Social Media Data: Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.

Stock Exchange Data: The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.

Power Grid Data: The power grid data holds information consumed by a particular node with respect to a base station.

Transport Data: Transport data includes model, capacity, distance and availability of a vehicle.

Search Engine Data: Search engines retrieve lots of data from different databases.


Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.

Structured data: Relational data.

Semi Structured data: XML data.

Unstructured data: Word, PDF, Text, Media Logs.



What are Big Data Technologies - Hadoop Interview Questions – Sandeep Kanao 

There are various technologies in the market from different vendors including Amazon, IBM, Microsoft, etc., to handle big data. While looking into the technologies that handle big data, we examine the following two classes of technology:

Operational Big Data

These include systems like MongoDB that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.

Analytical Big Data

These includes systems like Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical capabilities for retrospective and complex analysis that may touch most or all of the data.

MapReduce provides a new method of analyzing data that is complementary to the capabilities provided by SQL, and a system based on MapReduce that can be scaled up from single servers to thousands of high and low end machines.


What are the major challenges associated with Big Data? - Hadoop Interview Questions – Sandeep Kanao

The major challenges associated with big data are as follows:

Capturing data

Curation

Storage

Searching

Sharing

Transfer

Analysis

Presentation


These problems could be solved using an algorithm called MapReduce, introduced by Google. This algorithm divides the task into small parts and assigns them to many computers, and collects the results from them which when integrated, form the result dataset.

Big Data Hadoop Interview Questions – Sandeep Kanao

Big Data Hadoop Interview Questions – Sandeep Kanao

Big Data - Hadoop Interview Questions – Sandeep Kanao 


What is Big Data?  - Hadoop Interview Questions – Sandeep Kanao


Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.

 

What is Grid, cloud and cluster - Hadoop Interview Questions – Sandeep Kanao

Cloud: is simply an aggregate of computing power. You can think of the entire "cloud" as single server, for your purposes. It's conceptually much like an old school mainframe where you could submit your jobs to and have it return the result, except that nowadays the concept is applied more widely. (I.e. not just raw computing, also entire services, or storage ...)

Grid: a grid is simply many computers which together might solve a given problem/crunch data. The fundamental difference between a grid and a cluster is that in a grid each node is relatively independent of others; problems are solved in a divide and conquer fashion.

Cluster: conceptually it is essentially smashing up many machines to make a really big & powerful one. This is a much more difficult architecture than cloud or grid to get right because you have to orchestrate all nodes to work together, and provide consistency of things such as cache, memory, and not to mention clocks. Of course clouds have much the same problem, but unlike clusters clouds are not conceptually one big machine, so the entire architecture doesn't have to treat it as such. You can for instance not allocate the full capacity of your data center to a single request, whereas that is kind of the point of a cluster: to be able to throw 100% of the oomph at a single problem.



What are the examples of  Big Data? - Hadoop Interview Questions – Sandeep Kanao

 

Black Box Data: It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.

Social Media Data: Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.

Stock Exchange Data: The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.

Power Grid Data: The power grid data holds information consumed by a particular node with respect to a base station.

Transport Data: Transport data includes model, capacity, distance and availability of a vehicle.

Search Engine Data: Search engines retrieve lots of data from different databases.

 

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.

Structured data: Relational data.

Semi Structured data: XML data.

Unstructured data: Word, PDF, Text, Media Logs.

 

 

What are Big Data Technologies - Hadoop Interview Questions – Sandeep Kanao 

There are various technologies in the market from different vendors including Amazon, IBM, Microsoft, etc., to handle big data. While looking into the technologies that handle big data, we examine the following two classes of technology:

Operational Big Data

These include systems like MongoDB that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.

Analytical Big Data

These includes systems like Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical capabilities for retrospective and complex analysis that may touch most or all of the data.

MapReduce provides a new method of analyzing data that is complementary to the capabilities provided by SQL, and a system based on MapReduce that can be scaled up from single servers to thousands of high and low end machines.

 

What are the major challenges associated with Big Data? - Hadoop Interview Questions – Sandeep Kanao
 

The major challenges associated with big data are as follows:

Capturing data

Curation

Storage

Searching

Sharing

Transfer

Analysis

Presentation

 

These problems could be solved using an algorithm called MapReduce, introduced by Google. This algorithm divides the task into small parts and assigns them to many computers, and collects the results from them which when integrated, form the result dataset.

Wednesday, 28 September 2016

Capital Market Interview Questions - What is time value of money? - Sandeep Kanao

Capital Market Interview Questions - Overview of Risk Management - Sandeep Kanao

What is time value of money? - Sandeep Kanao
Time value of money is the value which is earned over a given amount of time in terms of interest.
For example if USD 200 money will be invested for about 1 year then the earning will be of 5% interest which will be worth 205 after one year. Future value can be predicted using time value of money terminology.

Overview of Hadoop MapReduce - Sandeep Kanao



Overview of Hadoop MapReduce – Sandeep Kanao 



Hadoop MapReduce is a software framework to process vast amounts of data in-parallel on large cluster hardware in a reliable, fault-tolerant manner.  


A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs


of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.  


The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.


Hadoop configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster. 


The MapReduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node.  


The master is responsible for scheduling the jobs component tasks on the slaves, monitoring them and re-executing the failed tasks.  


The slaves execute the tasks as directed by the master.


The Hadoop Job configuration consists of  map and reduce functions via implementations of appropriate interfaces and job parameters. 


The Hadoop job client then submits the job (jar/executable etc.) and configuration to the JobTracker which then takes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and monitoring them, providing status and diagnostic information to the job-client.




Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer. 


Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications 


Inputs and Outputs - Overview of Hadoop MapReduce – Sandeep Kanao


The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types.


The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.


Input and Output types of a MapReduce job - Sandeep Kanao:


(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3,v3> (output)