Some context

This is a live demo of batch processing using Ooso. Our example dataset contains records about yellow taxi trips in New York City. One field of these records is a ratecode id that we want to replace with an explicit ratecode name. The job is going to map the ids with an external data source and replace all the ratecode ids with the matching names.

Run the job

This is a live demo of Ooso. A dataset of 285,697 lines is going to be processed when you start the job.

  • Start
  • Cleanup

Input File Sample

VendorId ... RateCodeId ... TotalAmount

Output File Sample

VendorId ... RateCode ... TotalAmount
No output yet. Click "Start" to launch the job and start polling for results.

Price and performance

The previous example processes a very small dataset of 50MB but Ooso is meant to process large datasets. Ooso linearly scales in terms of price and execution time depending on the dataset size. Below are the results for the same queries on a 200GB dataset.

About Ooso

Ooso is a Java library that leverages AWS Lambda and Amazon S3 to run serverless MapReduce jobs without any Hadoop or Spark cluster. You can run Ad Hoc queries and Batch processing jobs (e.g. data cleaning and enrichment) while focusing on your business logic rather than operational constraints.

Ooso automatically scales according to your dataset size. Whether you have a thousand or a million data splits it will run just the right amount of computing units so you only pay for what you actually need.

Tell me more about Ooso.
Show me how to use it.