Download the PHP package himedia/emr-monitoring without Composer
On this page you can find all versions of the php package himedia/emr-monitoring. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download himedia/emr-monitoring
More information about himedia/emr-monitoring
Files in himedia/emr-monitoring
Package emr-monitoring
Short Description Command line tool for monitoring Amazon Elastic MapReduce (Amazon EMR) jobflows and analyze past jobflows.
License Apache-2.0
Informations about the package emr-monitoring
EMR Monitoring
Command line tool for monitoring Amazon Elastic MapReduce (Amazon EMR) jobflows and analyze past jobflows.
Table of Contents
- Overview
- Description
- Retrieve information from many places
- All that information is gathered in one screen
- Task timeline
- Installing
- Git clone
- Configuration
- Dependencies
- Usage
- Command line options
- With a finished jobflow
- With a new jobflow
- Documentation
- Copyrights & licensing
- ChangeLog
- Git branching model
Overview
Description
Retrieve information from many places
-
Amazon EMR via Amazon Elastic MapReduce Ruby Client to get description of a jobflow:
-
Amazon EC2 via Amazon EC2 API Tools to retrieve history of spots instances price:
-
Amazon S3 via S3cmd to get size of both input and output files, to retrieve potential errors and to get log summary:
-
Amazon Elastic MapReduce Pricing of On-Demand instances via this URL and its underlying JSON service.
-
Hadoop JobTracker running on the master node and accessed by an automatic SSH tunnel:
- Additionally, EMR Monitoring computes elapsed times between various events and realizes an estimation of the jobflow's total cost.
All that information is gathered in one screen
An animation is better than a thousand words:
Result with a completed jobflow (click for full resolution image):
Some clarifications
Price
- The ask price for spot instances comes in real time from EC2 API Tools.
- The total price in general section is the sum of the prices of each instance group,
i.e. for each group:
<instance-price> × <number-of-instances> × ceil(<number-of-hours>)
.
Elapsed times
- Elapsed times in gray measure the time elapsed between initialization and start date of instance/step, and between start date and end date of instance/step.
- When start date or end date is unknown, then elapsed times are computed according to the local time
and a
≈
sign is added.
Completion percentages
Completion percentages are computed from Hadoop JobTracker data and are NOT the number of remaining tasks divided by the number of completed tasks.
Error messages
Error messages, if any, are always displayed:
Task timeline
A task timeline is generated via gnuplot including all jobs of in progress or past jobflow and giving details on number of mapper, shuffle, merge and reducer tasks.
Animation from generated task timelines throughout jobflow run:
Result with a completed jobflow (click for full resolution image):
Installing
Git clone
Create a folder, e.g. /usr/local/lib/emr-monitoring
, and cd
into it.
Then clone the repository (the folder must be empty!):
Configuration
Initialize configuration file from conf/config-dist.php
and adapt it:
If Bash is not your default shell, then fill $aConfig['Himedia\EMR']['shell']
whith your Bash interpreter path, e.g. /bin/bash
.
Dependencies
All dependencies are checked at launch and EMR Monitoring systematically helps to resolve them.
Composer dependencies
PHP class autoloading and PHP dependencies are managed by composer.
Text version
To set up the project dependencies with composer, run one of the following commands:
If needed, to install composer locally, run one of the following commands:
Read http://getcomposer.org/doc/00-intro.md#installation-nix for more information.
EMR CLI
Amazon Elastic MapReduce Ruby Client is needed to get description of a jobflow. Warning: it requires Ruby 1.8.7 and is not compatible with later versions of Ruby.
Text version
To install Amazon EMR Command Line Interface:
Create a file named /usr/local/lib/elastic-mapreduce-cli/credentials.json
with at least the following lines:
The key-pair-file
key is especially used to open a SSH tunnel to the master node and consult Hadoop JobTracker.
If necessary, adapt emr_cli_bin
, aws_access_key
and aws_secret_key
keys
of $aConfig['Himedia\EMR']
in conf/config.php
.
Read http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-install.html for more information.
EC2 API Tools
Amazon EC2 API Tools allows to retrieve history of spots instances price.
Text version
To install Amazon EC2 API Tools:
For example, include these commands in your ~/.bashrc
and reload it:
Read http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/setting_up_ec2_command_linux.html for more information.
S3cmd
S3cmd is required to get size of both input and output files, to retrieve potential errors and to get log summary.
Text version
Please run:
Read http://s3tools.org/s3cmd for more information.
Gnuplot
Task timelines are generated via gnuplot for in progress or past jobflow and give details on number of mapper, shuffle, merge and reducer tasks.
Text version
Usage
Command line options
You can view the options by running:
Text version
With a finished jobflow
Simply:
With a new jobflow
-
Launching a jobflow using Amazon Elastic MapReduce:
-
You can see it in the list of all jobflows:
-
Start monitoring of the jobflow:
You can easily view the task timeline with, for example, Eye of Gnome:
Documentation
API documentation generated by ApiGen
and included in the doc/api
folder.
Copyrights & licensing
Licensed under the Apache License 2.0. See LICENSE file for details.
ChangeLog
See CHANGELOG file for details.
Git branching model
The git branching model used for development is the one described and assisted by twgit
tool: https://github.com/Twenga/twgit.
All versions of emr-monitoring with dependencies
psr/log Version 1.0.0
geoffroy-aubry/errorhandler Version 1.*
geoffroy-aubry/logger Version 1.*
geoffroy-aubry/helpers Version 1.*
ulrichsg/getopt-php Version dev-master