OpenSource Middleware for Data Desktop Grids

Gilles Fedak, Haiwu He / INRIA-Futurs

What is BitDew ?

The BitDew framework is a programmable environment for management and distribution of data on computational Desktop Grids.

BitDew is a subsystem which can be easily integrated into Desktop Grid systems (XtremWeb, BOINC, Condor etc..). Currently, Desktop Grids are mostly limited to embarrassingly parallel applications with few data dependencies. BitDew objective is to broaden the use of Desktop Grids. Our approach is to break the "data wall" by providing in single package the key P2P technologies (DHT, BitTorrent) and high level programming interfaces. We first target Desktop Grid with peta-scale data system : up to 1K files/nodes, with size up to 1GB and distributed to 10K to 100K nodes.

The BitDew framework will enable the support for data-intense parameter sweep applications, long-running applications which requires distributed checkpoint services, workflow applications and maybe in the future soft-realtime and stream processing applications.

What Can I do with BitDew ?

BitDew offers programmers a simple API for creating, accessing, storing and moving data with ease, even on highly dynamic and volatile environments.

The BitDew programming model relies on 5 abstractions to manage the data : i) replication indicates how many occurrences of a data should be available at the same time on the network, ii) fault-tolerance controls the policy in presence of machine crash, iii) lifetime is an attribute absolute or relative to the existence of other data, which decides the life cycle of a data in the system, iv) affinity drives movement of data according to dependency rules, v) protocol gives the runtime environment hints about the protocol to distribute the data (http, ftp or bittorrent). Programmers define for every data these simple criteria, and let the BitDew runtime environment manage operations of data creation, deletion, movement, replication, and fault-tolerance operation.

Bitdew Architecture

The BitDew runtime environment is a flexible environment implementing the API. It relies both on centralized and distributed protocols for indexing, storage and transfers providing reliability, scalability and high-performance.

The architecture follows a classical three-tiers schema commonly found in Desktop Grids: it divides the world in two sets of nodes : stable nodes and volatile nodes. Stable nodes run various independent services which compose the runtime environment: Data Repository (DR), Data Catalog (DC), Data Transfer (DT) and Data Scheduler (DC). We call these nodes the service hosts. Volatile nodes can either ask for storage resources (we call them client hosts) or offer their local storage (they are called reservoir hosts). Usually, programmers will not use directly the various D* services; instead they will use the API which in turn hides the complexity of internal protocols.

The Bitdew runtime environment delegates a large number of operation to third party components : 1) Meta-data information are serialized using a traditional SQL database, 2) data transfer are realized out-of-band by specialized file transfer protocols and 3) publish and look-up of data replica is enabled by the means of of DHT protocols. One feature of the system is that all of these components can be replaced and plugged-in by the users, allowing them to select the most adequate subsystem according to their own criteria like performance, reliability and scalability.

Download and Licence

BitDew is still in a development phase and latest version can be accessed only through a svn interface. We have released a preliminary 0.0.1 version under the GPL-v2 licence. Be warned that this version is unstable, lacks documentation and may only be used by confirmed developers.

See BitDew in Action