Skip to content
/ rubydoop Public
forked from jbr/rubydoop

Simple Ruby Sugar for Hadoop Streaming

Notifications You must be signed in to change notification settings

bcg/rubydoop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rubydoop - Simple Ruby Sugar for Hadoop Streaming

Example - Inverted Index

Input: file@linenum\tline
(where line may include tabs or spaces, and likely has many words)

Desired output: Each word, stripped of punctuation, paired with a comma-delimited list of file@linenum locations for quick lookup.

inverted-index.rb


#!/usr/bin/ruby
require "rubydoop"

HADOOP_HOME = "/usr/local/hadoop"

map do |location, line|
  line.split(/\s+/).each do |word|
    next unless word.strip.length > 0
    emit word.strip.downcase.gsub(/^\(|[^a-zA-Z]$/, ''), location
  end
end

reduce do |key, values|
  emit key, values.join(",")
end

Running

./inverted-index.rb start

Assuming you have your hadoop environment all set up, this will fire up a task with the appropriate map and reduce functions.

Testing/Simulating

./inverted-index.rb simulate test-file.txt

Which executes a poor-man’s local MR:

cat test-file.txt | ./inverted-index.rb map | sort | ./inverted-index.rb reduce

About

Simple Ruby Sugar for Hadoop Streaming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%