<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>47</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Matei Zaharia</style></author><author><style face="normal" font="default" size="100%">Dhruba Borthakur</style></author><author><style face="normal" font="default" size="100%">Sen Sarma, Joydeep</style></author><author><style face="normal" font="default" size="100%">Khaled Elmeleey</style></author><author><style face="normal" font="default" size="100%">Scott Shenker</style></author><author><style face="normal" font="default" size="100%">Ion Stoica</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling</style></title><secondary-title><style face="normal" font="default" size="100%">EuroSys</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2010</style></year><pub-dates><date><style  face="normal" font="default" size="100%">04/2010</style></date></pub-dates></dates><pub-location><style face="normal" font="default" size="100%">Paris, France</style></pub-location><abstract><style face="normal" font="default" size="100%">As organizations start to use data-intensive cluster computing systems like Hadoop and Dryad for more applications, there is a growing need to share clusters between users. However, there is a conflict between fairness in scheduling and data locality (placing tasks on nodes that contain their input data). We illustrate this problem through our experience designing a fair scheduler for a 600-node Hadoop cluster at Facebook. To address the conflict between locality and fairness, we propose a simple algorithm called delay scheduling: when the job that should be scheduled next according to fairness cannot launch a local task, it waits for a small amount of time, letting other jobs launch tasks instead. We find that delay scheduling achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness. In addition, the simplicity of delay scheduling makes it applicable under a wide variety of scheduling policies beyond fair sharing.
</style></abstract></record></records></xml>