Tencent/flare

About scheduler_lock

4kangjc opened this issue · 11 comments

// This lock is held when the fiber is in state-transition (e.g., from running
// to suspended). This is required since it's inherent racy when we add
// ourselves into some wait-chain (and eventually woken up by someone else)
// and go to sleep. The one who wake us up can be running in a different
// pthread, and therefore might wake us up even before we actually went sleep.
// So we always grab this lock before transiting the fiber's state, to ensure
// that nobody else can change the fiber's state concurrently.
//
// For waking up a fiber, this lock is grabbed by whoever the waker;
// For a fiber to go to sleep, this lock is grabbed by the fiber itself and
// released by *`SchedulingGroup`* (by the time we're sleeping, we cannot
// release the lock ourselves.).
//
// This lock also protects us from being woken up by several pthread
// concurrently (in case we waited on several waitables and have not removed
// us from all of them before more than one of then has fired.).
Spinlock scheduler_lock;

// Argument `context` (i.e., `this`) is only used the first time the context
// is jumped to (in `FiberProc`).
jump_context(&caller->state_save_area, state_save_area, this);

或许可以利用一下这个Argument contextfiber切换过去再将caller fiberState改变,状态改变的时候就不需要这把锁了?

inline void FiberEntity::Resume() noexcept {
  SetCurrentFiberEntity(this);
  state = FiberState::Running;
  // Argument `context`  set caller
  auto caller_ = jump_context(&caller->state_save_area, state_save_area, caller);
  // caller_ set nullptr when fiber return
  if (caller_) {
    static_cast<FiberEntity*>(caller_)->state = FiberState::Waiting;
  }
  ...
}

static void FiberProc(void* context) {
  auto caller = reinterpret_cast<FiberEntity*>(context);
  caller->state = FiberState::Waiting;
  //....
  current_fiber->state = FiberState::Dead;
  GetMasterFiberEntity()->M_return([](){...});
}

void FiberEntity::M_return(Function<void()>&& cb) noexcept {
  // set `resume_proc` ....
  SetCurrentFiberEntity(this);
  state = FiberState::Running;
  // set Argument `context`  nullptr
  jump_context(&caller->state_save_area, state_save_area, nullptr);
}

// This lock is held when the fiber is in state-transition (e.g., from running
// to suspended). This is required since it's inherent racy when we add
// ourselves into some wait-chain (and eventually woken up by someone else)
// and go to sleep. The one who wake us up can be running in a different
// pthread, and therefore might wake us up even before we actually went sleep.
// So we always grab this lock before transiting the fiber's state, to ensure
// that nobody else can change the fiber's state concurrently.
//
// For waking up a fiber, this lock is grabbed by whoever the waker;
// For a fiber to go to sleep, this lock is grabbed by the fiber itself and
// released by *`SchedulingGroup`* (by the time we're sleeping, we cannot
// release the lock ourselves.).
//
// This lock also protects us from being woken up by several pthread
// concurrently (in case we waited on several waitables and have not removed
// us from all of them before more than one of then has fired.).
Spinlock scheduler_lock;

// Argument `context` (i.e., `this`) is only used the first time the context
// is jumped to (in `FiberProc`).
jump_context(&caller->state_save_area, state_save_area, this);

或许可以利用一下这个Argument contextfiber切换过去再将caller fiberState改变,状态改变的时候就不需要这把锁了?

inline void FiberEntity::Resume() noexcept {
  SetCurrentFiberEntity(this);
  state = FiberState::Running;
  // Argument `context`  set caller
  auto caller_ = jump_context(&caller->state_save_area, state_save_area, caller);
  // caller_ set nullptr when fiber return
  if (caller_) {
    static_cast<FiberEntity*>(caller_)->state = FiberState::Waiting;
  }
  ...
}

static void FiberProc(void* context) {
  auto caller = reinterpret_cast<FiberEntity*>(context);
  caller->state = FiberState::Waiting;
  //....
  current_fiber->state = FiberState::Dead;
  GetMasterFiberEntity()->M_return([](){...});
}

void FiberEntity::M_return(Function<void()>&& cb) noexcept {
  // set `resume_proc` ....
  SetCurrentFiberEntity(this);
  state = FiberState::Running;
  // set Argument `context`  nullptr
  jump_context(&caller->state_save_area, state_save_area, nullptr);
}

curren_fiber的设置在yield之前设置应该没问题吧, 如果有问题的话,context就设置成这个吧

struct jump_context_data {
  FiberEntity* self, *caller;
};

这儿不(只)在于FiberProc(fiber创建后第一次运行),在涉及到fiber::Mutex之类的场景,从pthread1加入wait-chain之后,可能立刻就会被pthread2从chain中弹出并准备执行,这时候需要某种同步机制达到「pthread2等待直到pthread1不再操作fiber_entity」,无论是spinlock,还是复用其他某个字段(比如fiber->state),最后对性能的影响应该都是类似的,相比之下spinlock更利于对整体代码的理解。

不过就性能而言,fiber调度的主要成本在于pthread2的唤醒,这儿多一个少一个atomic的操作应该实际上感知不到

不过现在这个写法,fiber的state就和schedulingGroup强绑定了,需要它才能去正确的得到fiber的state

不排除我有什么地方记岔了,但是fiber的state应该只要能获取到scheduler_lock就可以访问,并不要求跟scheduling_group绑定

比如SchedulingGroup::RemoteAcquireFiber就是用来将fiber在scheduling group之间迁移的

不排除我有什么地方记岔了,但是fiber的state应该只要能获取到scheduler_lock就可以访问,并不要求跟scheduling_group绑定

比如SchedulingGroup::RemoteAcquireFiber就是用来将fiber在scheduling group之间迁移的

噢好像,fiber单独拿出来使用也不能yield,要配合scheduling_group才能yield,也就不涉及waiting状态的改变。我的原意是指单独使用fiber的时候,去yield,fiber的状态没有变成waiting

// GetId()?
this_fiber里这个返回debug_id不行么

https://github.com/Tencent/flare/blob/master/flare/fiber/alternatives.h#L35

这个,用来不依赖__const__那个hack的情况下获取当前线程id

https://github.com/Tencent/flare/blob/master/flare/fiber/alternatives.h#L35

这个,用来不依赖__const__那个hack的情况下获取当前线程id

thread id也有errno这个问题吗

// GetId()?
this_fiber里这个返回debug_id不行么

// `GetId()`?

std::uint64_t debugging_fiber_id;

thread id也有errno这个问题吗

有,其实用到thread-local storage都有可能有问题

this_fiber里这个返回debug_id不行么

这个id有需求可以返回,这个是fiber的id。但是async_test.cc那个测试里面测试的目的是「Dispatch方式运行的fiber会直接在caller的pthread里面运行」,所以比较的是threadid

thread id也有errno这个问题吗

有,其实用到thread-local storage都有可能有问题

this_fiber里这个返回debug_id不行么

这个id有需求可以返回,这个是fiber的id。但是async_test.cc那个测试里面测试的目的是「Dispatch方式运行的fiber会直接在caller的pthread里面运行」,所以比较的是threadid

不是,这是另外一个话题了,我的意思是this_fiber.h里怎么没有GetId这个函数,实现就用fiber_entitydebugging_fiber_id不行吗,

// `GetId()`?

我看这里只是打了个问号