The rapid growth of the Internet and World Wide Web has led to huge amounts of information available online and the need for Big Data processing capabilities.
Business and government organizations create large amounts of both structured and unstructured information which needs to be processed, analyzed, and linked.
The National Science Foundation has identified key issues related to data-intensive computing problems such as the programming abstractions including models, languages, and algorithms which allow a natural expression of parallel processing of data.
Data-centric programming languages provide a processing approach in which applications are expressed in terms of high-level operations on data, and the runtime system transparently controls the scheduling, execution, load balancing, communications, and movement of programs and data across the computing cluster.
These include Pig – a high-level data-flow programming language and execution framework for data-intensive computing.
to provide a specific data-centric language notation for data analysis applications and to improve programmer productivity and reduce development cycles when using the Hadoop MapReduce environment.
Pig provides capabilities in the language for loading, storing, filtering, grouping, de-duplication, ordering, sorting, aggregation, and joining operations on the data.
The HPCC data-intensive computing platform from LexisNexis Risk Solutions includes a new high-level declarative, data-centric programming language called ECL.
PATTERN statements can be combined to implement complex parsing operations or complete grammars from Backus–Naur form (BNF) definitions.