Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/source/design/mooncake-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ The data structure details of `ReplicateConfig` are as follows:
struct ReplicateConfig {
size_t replica_num{1}; // Total number of replicas for the object
bool with_soft_pin{false}; // Whether to enable soft pin mechanism for this object
bool with_hard_pin{false}; // Whether to enable hard pin (never evicted)
std::string preferred_segment{}; // Preferred segment for allocation
};
```
Expand Down Expand Up @@ -688,6 +689,18 @@ There are two startup parameters in `master_service` related to the soft pin mec

Notably, soft pinned objects can still be removed using APIs such as `Remove` or `RemoveAll`.

### Hard Pin

For objects that must never be evicted under any circumstances (e.g., model weights, critical metadata), Mooncake Store provides a hard pin mechanism. Unlike soft pin, hard-pinned objects are permanently protected from eviction — they will never be selected as eviction candidates regardless of memory pressure.

Hard pin is set at object creation time through the `with_hard_pin` field in `ReplicateConfig` and cannot be changed afterward. Hard-pinned objects can only be removed explicitly via `Remove` (with force) or `RemoveAll`.

Key differences from soft pin:

- Hard pin never expires. Soft pin status is removed after a configurable TTL if the object is not accessed.
- Hard-pinned objects are completely skipped during eviction. Soft-pinned objects may still be evicted when no other candidates are available.
- Hard pin is immutable once set. Soft pin status is automatically refreshed on access.

### Zombie Object Cleanup

If a Client crashes or experiences a network failure after sending a `PutStart` request but before it can send the corresponding `PutEnd` or `PutRevoke` request to the Master, the object initiated by `PutStart` enters a "zombie" state—rendering it neither usable nor deletable. The existence of such "zombie objects" not only consumes storage space but also prevents subsequent `Put` operations on the same keys. To mitigate these issues, the Master records the start time of each `PutStart` request and employs two timeout thresholds—`put_start_discard_timeout` and `put_start_release_timeout`—to clean up zombie objects.
Expand All @@ -712,6 +725,7 @@ The preferred segment allocation feature is implemented through the `AllocationS
struct ReplicateConfig {
size_t replica_num{1}; // Total number of replicas for the object
bool with_soft_pin{false}; // Whether to enable soft pin mechanism for this object
bool with_hard_pin{false}; // Whether to enable hard pin (never evicted)
std::string preferred_segment{}; // Preferred segment for allocation
};
```
Expand Down
12 changes: 9 additions & 3 deletions mooncake-store/include/master_service.h
Original file line number Diff line number Diff line change
Expand Up @@ -484,12 +484,13 @@ class MasterService {
const UUID& client_id_,
const std::chrono::system_clock::time_point put_start_time_,
size_t value_length, std::vector<Replica>&& reps,
bool enable_soft_pin)
bool enable_soft_pin, bool enable_hard_pin = false)
: client_id(client_id_),
put_start_time(put_start_time_),
size(value_length),
lease_timeout(),
soft_pin_timeout(std::nullopt),
hard_pinned(enable_hard_pin),
replicas_(std::move(reps)) {
MasterMetricManager::instance().inc_key_count(1);
if (enable_soft_pin) {
Expand All @@ -516,6 +517,7 @@ class MasterService {
mutable std::optional<std::chrono::system_clock::time_point>
soft_pin_timeout GUARDED_BY(lock); // optional soft pin, only
// set for vip objects
const bool hard_pinned{false}; // immutable, set at creation

void AddReplicas(std::vector<Replica>&& replicas) {
replicas_.insert(replicas_.end(),
Expand Down Expand Up @@ -684,6 +686,8 @@ class MasterService {
return soft_pin_timeout && now < *soft_pin_timeout;
}

bool IsHardPinned() const { return hard_pinned; }

// Check if the metadata is valid
// Valid means it has at least one valid replica and size is greater
// than 0
Expand Down Expand Up @@ -897,15 +901,17 @@ class MasterService {
}

void Create(const UUID& client_id, uint64_t total_length,
std::vector<Replica> replicas, bool enable_soft_pin) {
std::vector<Replica> replicas, bool enable_soft_pin,
bool enable_hard_pin = false) {
if (Exists()) {
throw std::logic_error("Already exists");
}
const auto now = std::chrono::system_clock::now();
auto result = shard_guard_->metadata.emplace(
std::piecewise_construct, std::forward_as_tuple(key_),
std::forward_as_tuple(client_id, now, total_length,
std::move(replicas), enable_soft_pin));
std::move(replicas), enable_soft_pin,
enable_hard_pin));
it_ = result.first;
}

Expand Down
2 changes: 2 additions & 0 deletions mooncake-store/include/replica.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ inline std::ostream& operator<<(std::ostream& os,
struct ReplicateConfig {
size_t replica_num{1};
bool with_soft_pin{false};
bool with_hard_pin{false}; // Hard pin: object cannot be evicted
std::vector<std::string>
preferred_segments{}; // Preferred segments for allocation
std::string preferred_segment{}; // Deprecated: Single preferred segment
Expand All @@ -94,6 +95,7 @@ struct ReplicateConfig {
const ReplicateConfig& config) noexcept {
os << "ReplicateConfig: { replica_num: " << config.replica_num
<< ", with_soft_pin: " << config.with_soft_pin
<< ", with_hard_pin: " << config.with_hard_pin
<< ", preferred_segments: [";
for (size_t i = 0; i < config.preferred_segments.size(); ++i) {
os << config.preferred_segments[i];
Expand Down
46 changes: 33 additions & 13 deletions mooncake-store/src/master_service.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -766,7 +766,7 @@ auto MasterService::PutStart(const UUID& client_id, const std::string& key,
shard->metadata.emplace(
std::piecewise_construct, std::forward_as_tuple(key),
std::forward_as_tuple(client_id, now, total_length, std::move(replicas),
config.with_soft_pin));
config.with_soft_pin, config.with_hard_pin));
// Also insert the metadata into processing set for monitoring.
shard->processing_keys.insert(key);

Expand Down Expand Up @@ -2886,6 +2886,10 @@ void MasterService::BatchEvict(double evict_ratio_target,
candidates; // can be removed
for (auto it = shard->metadata.begin(); it != shard->metadata.end();
it++) {
// Hard-pinned objects are never evicted
if (it->second.IsHardPinned()) {
continue;
}
// Skip objects that are not expired or have incomplete replicas
if (!it->second.IsLeaseExpired(now) ||
!can_evict_replicas(it->second)) {
Expand Down Expand Up @@ -2920,7 +2924,8 @@ void MasterService::BatchEvict(double evict_ratio_target,
while (it != shard->metadata.end()) {
// Skip objects that are not allowed to be evicted in the first
// pass
if (!it->second.IsLeaseExpired(now) ||
if (it->second.IsHardPinned() ||
!it->second.IsLeaseExpired(now) ||
it->second.IsSoftPinned(now) ||
!can_evict_replicas(it->second)) {
++it;
Expand Down Expand Up @@ -2983,7 +2988,8 @@ void MasterService::BatchEvict(double evict_ratio_target,
(start_idx + i) % kNumShards);
auto it = shard->metadata.begin();
while (it != shard->metadata.end() && target_evict_num > 0) {
if (it->second.lease_timeout <= target_timeout &&
if (!it->second.IsHardPinned() &&
it->second.lease_timeout <= target_timeout &&
!it->second.IsSoftPinned(now) &&
can_evict_replicas(it->second)) {
// Evict this object
Expand Down Expand Up @@ -3025,9 +3031,9 @@ void MasterService::BatchEvict(double evict_ratio_target,

auto it = shard->metadata.begin();
while (it != shard->metadata.end() && target_evict_num > 0) {
// Skip objects that are not expired or have incomplete
// replicas
if (!it->second.IsLeaseExpired(now) ||
// Skip hard-pinned or not-yet-expired objects
if (it->second.IsHardPinned() ||
!it->second.IsLeaseExpired(now) ||
!can_evict_replicas(it->second)) {
++it;
continue;
Expand Down Expand Up @@ -3485,7 +3491,8 @@ MasterService::MetadataSerializer::DeserializeShard(const msgpack::object& obj,
std::forward_as_tuple(
metadata_ptr->client_id, metadata_ptr->put_start_time,
metadata_ptr->size, metadata_ptr->PopReplicas(),
metadata_ptr->soft_pin_timeout.has_value()));
metadata_ptr->soft_pin_timeout.has_value(),
metadata_ptr->IsHardPinned()));

it->second.lease_timeout = metadata_ptr->lease_timeout;
it->second.soft_pin_timeout = metadata_ptr->soft_pin_timeout;
Expand All @@ -3500,10 +3507,12 @@ MasterService::MetadataSerializer::SerializeMetadata(
MsgpackPacker& packer) const {
// Pack ObjectMetadata using array structure for efficiency
// Format: [client_id, put_start_time, size, lease_timeout,
// has_soft_pin_timeout, soft_pin_timeout, replicas_count, replicas...]
// has_soft_pin_timeout, soft_pin_timeout, replicas_count, replicas...,
// hard_pinned]

size_t array_size = 7; // size, lease_timeout, has_soft_pin_timeout,
// soft_pin_timeout, replicas_count
size_t array_size = 8; // client_id, put_start_time, size, lease_timeout,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid hardcoding here? Use size_t array_size = sizeof(struct xxx) + sizeof(struct xxx)

// has_soft_pin_timeout, soft_pin_timeout,
// replicas_count + hard_pinned
Comment on lines +3513 to +3515
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for array_size calculation could be made clearer. Currently, it lists replicas_count + hard_pinned, which might be misinterpreted as hard_pinned being added to replicas_count. It would be more precise to list hard_pinned as a separate fixed field, making it clear that 8 is the base count of fixed fields before adding the variable replicas_count.

Suggested change
size_t array_size = 8; // client_id, put_start_time, size, lease_timeout,
// has_soft_pin_timeout, soft_pin_timeout,
// replicas_count + hard_pinned
size_t array_size = 8; // client_id, put_start_time, size, lease_timeout,
// has_soft_pin_timeout, soft_pin_timeout,
// replicas_count, hard_pinned

array_size += metadata.CountReplicas(); // One element per replica
packer.pack_array(array_size);

Expand Down Expand Up @@ -3552,6 +3561,8 @@ MasterService::MetadataSerializer::SerializeMetadata(
}
}

packer.pack(metadata.IsHardPinned());

return {};
}

Expand All @@ -3567,6 +3578,7 @@ MasterService::MetadataSerializer::DeserializeMetadata(

// Need at least 7 elements: client_id, put_start_time, size, lease_timeout,
// has_soft_pin_timeout, soft_pin_timeout, replicas_count
// (8th element = hard_pinned is optional for backward compat)
if (obj.via.array.size < 7) {
return tl::unexpected(SerializationError(
ErrorCode::DESERIALIZE_FAIL,
Expand Down Expand Up @@ -3599,8 +3611,10 @@ MasterService::MetadataSerializer::DeserializeMetadata(
// Deserialize replicas count
uint32_t replicas_count = array[index++].as<uint32_t>();

// Check if array size matches replicas_count
if (obj.via.array.size != 7 + replicas_count) {
// Array size: 7 + replicas_count (old format) or 8 + replicas_count (new
// format with hard_pinned)
if (obj.via.array.size != 7 + replicas_count &&
obj.via.array.size != 8 + replicas_count) {
return tl::unexpected(SerializationError(
ErrorCode::DESERIALIZE_FAIL,
"deserialize ObjectMetadata array size mismatch"));
Expand All @@ -3619,13 +3633,19 @@ MasterService::MetadataSerializer::DeserializeMetadata(
replicas.emplace_back(std::move(*result.value()));
}

// Deserialize hard_pinned (if present, otherwise default to false)
bool is_hard_pinned = false;
if (index < obj.via.array.size) {
is_hard_pinned = array[index++].as<bool>();
}

// Create ObjectMetadata instance
bool enable_soft_pin = has_soft_pin_timeout;
auto metadata = std::make_unique<ObjectMetadata>(
client_id,
std::chrono::system_clock::time_point(
std::chrono::milliseconds(put_start_time_timestamp)),
size, std::move(replicas), enable_soft_pin);
size, std::move(replicas), enable_soft_pin, is_hard_pinned);
metadata->lease_timeout = std::chrono::system_clock::time_point(
Comment on lines 3642 to 3649
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeserializeMetadata() correctly parses the optional hard_pinned flag and passes it into the temporary ObjectMetadata instance, but the restore path in DeserializeShard() reconstructs shard metadata by emplacing a new ObjectMetadata from metadata_ptr without forwarding hard_pinned, so hard-pinned objects will come back as not hard pinned after restoring a snapshot. Please ensure the shard-level reconstruction preserves hard_pinned (either pass it into the constructor or set the field under lock after emplace).

Copilot uses AI. Check for mistakes.
std::chrono::milliseconds(lease_timestamp));

Expand Down
Loading
Loading