This research paper delves into the realm of
data routing strategies enhanced by a distributed Bloom
Filter. The utilization of data deduplication technology
effectively curbs data storage requirements and
optimizes resource utilization. While the potential of
single-node storage and computation is limited, the
cluster data deduplication approach offers significant
advantages. However, it introduces fresh challenges
related to diminishing deduplication rates and
maintaining equilibrium among storage nodes.
To address these concerns, the study introduces a
novel data routing strategy grounded in distributed
Bloom Filter principles. The strategy capitalizes on the
concept of a "Super chunk" as the fundamental data
routing unit, bolstering overall system throughput.
Following Broder's theorem, a selection process
identifies the k smallest fingerprints, shaping Super
chunk features sent to storage nodes. By employing
Bloom Filter comparisons, the optimal routing node is
determined, taking into account node storage capacity
and memory maintenance.The research progresses to the design and
implementation of system prototypes. Rigorous
experimentation yields precise parameters for various
routing strategies, subsequently subjected to testing. The
results affirm the viability of the proposed strategies,
both theoretically and empirically.
Keywords : Data Routing, Load Balancing, Clustered Deduplication, Distributed bloom filters, Super chunk, Deduplication rate, Communication overhead, Storage system, Cloud computing, System throughput.