This is a live demo of batch processing using Ooso. Our example dataset contains records about yellow taxi trips in New York City. One field of these records is a ratecode id that we want to replace with an explicit ratecode name. The job is going to map the ids with an external data source and replace all the ratecode ids with the matching names.
This is a live demo of Ooso. A dataset of 285,697 lines is going to be processed when you start the job.
Input File Sample
Output File Sample
The previous example processes a very small dataset of 50MB but Ooso is meant to process large datasets. Ooso linearly scales in terms of price and execution time depending on the dataset size. Below are the results for the same queries on a 200GB dataset.
Ooso is a Java library that leverages AWS Lambda and Amazon S3 to run serverless MapReduce jobs without any Hadoop or Spark cluster. You can run Ad Hoc queries and Batch processing jobs (e.g. data cleaning and enrichment) while focusing on your business logic rather than operational constraints.
Ooso automatically scales according to your dataset size. Whether you have a thousand or a million data splits it will run just the right amount of computing units so you only pay for what you actually need.Tell me more about Ooso.
Show me how to use it.