TimelyDataflow/timely-dataflow

Make logging more flexible

bmmcq opened this issue · 6 comments

bmmcq commented

We use timely on v0.7.0 release;

As a user, I think it may be reasonable to registry a logger with name, for example "dataflow_a" :

        // Register timely worker logging.
        worker.log_register().insert::<TimelyEvent,_>("dataflow_a", |_time, data|
            data.iter().for_each(|x| println!("LOG1: {:?}", x))
        );

and then use the name "dataflow_a" to get a logger of this dataflow in timely . By this way, we can use different logger for different dataflow :

    // we can call dataflow_using("dataflow_a", ...)
    pub fn dataflow_using<T: Timestamp, R, F:FnOnce(&mut V, &mut Child<Self, T>)->R, V: Any+'static>(&mut self, name: &'static str, mut resources: V, func: F) -> R {

        let addr = vec![self.allocator.borrow().index()];
        let dataflow_index = self.allocate_dataflow_index();
        // get logger by name;
        let mut logging = self.logging.borrow_mut().get(name);
        
       // same with current release;
}

On the other hand, can you add a SubgraphEvent like:

#[derive(Abomonation, Debug, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)]
/// Subgraph start or stop.
pub struct SubgraphEvent {
    /// Worker-unique identifier for subgraph;
    pub addr: Vec<usize>,

    /// `Start` if the subgraph is  starting, `Stop` if it is stopping in this step.
    pub start_stop: StartStop,
}

and log these events each time when Subgraph is scheduled :

    // method of subgraph: 
    fn pull_internal_progress(&mut self, consumed: &mut [ChangeBatch<TOuter>],
                                         internal: &mut [ChangeBatch<TOuter>],
                                         produced: &mut [ChangeBatch<TOuter>]) -> bool
    {

        self.logging.as_mut().map(|l| l.log(::logging::TimelyEvent::Subgraph(::logging::SubgraphEvent {
            addr: self.path.clone(),
            start_stop: ::logging::StartStop::Start,
        }))).unwrap_or(());

       //  ... subgraph logic, same with current;

        self.logging.as_mut().map(|l|l.log(::logging::TimelyEvent::Subgraph(::logging::SubgraphEvent {
            addr: self.path.clone(),
            start_stop: ::logging::StartStop::Stop {activity: active},
        }))).unwrap_or(());

        active

so when we analyze these logs, we can know how much time costed for these method calls;

Let's see if I understand!

First up, logging is definitely not locked down, and we can make lots of changes there. Things have quieted down recently because they got to where we needed them, but no reason not to change things.

I think both of these things can be done with the current code, although they are not especially appealing as the best way to do them. But, let's break those two things apart ("can it be done" versus "is this the right way").

  1. You should be able to do per-dataflow logging already, by swapping in the logger that you want for the "timely" logger. You can swap the logger in before your call to dataflow and then swap the old logger back in afterwards. Arguably, we may want to prevent this to avoid people messing up the logging, but at the moment it should be possible.

    For example, examples/logging-send.rs does this when it changes the logger to print output. The result of insert should be an Option<_> containing the old logger, so that it can be replaced.

    We can for sure debate whether this is the best way to do things (very dangerous!) but does this serve the need that you have for dropping in a custom logger?

  2. We can for sure add such an event, but I think you should be able to extract that information in post processing, as there will be an Schedule event for the subgraph itself, with both a Start and Stop event. You would need to look at the addresses to pick out which of these events are subgraphs, but they should be there in the logs with all the other Schedule events (of which there are way too many! we hope that event-driven scheduling will improve this).

What do you think about these two? Quick fixes for your problems, perhaps. But we can leave the issue open too to think about how better to expose idioms for these.

bmmcq commented

For the first problem, what we want is to run multiple dataflows at the same time, but with different logging format; So we hope to use a dataflow name to select an appropriate logger.

For the second problem, as we know, only subgraphs in the sub-scopes will be scheduled as operators. But for the subgraphs in the top scope, we think no Schedule events will be logged;

I see!

I think the first problem can still be handled by swapping out the logger, calling dataflow(), then swapping back in the original logger. You can have multiple concurrent dataflows still, each with a different logger (changing the logger only changes the logger that is captured by new dataflows and operators, not the logger used by previously built dataflows).

For the second problem, this is a good point! Dataflows are almost surely not logged as their execution starts, and we can fix that. We could either leave it as a Schedule event, or put together a new Dataflow event, or build a Subgraph event and port the existing subgraph scheduling events to use it (though, it would be redundant with the Schedule event produced by any parent scope, when it is not the root scope of a dataflow).

bmmcq commented

Thanks for the suggestion on the first logging problem, it really works, although is a little strange.

We also want to trace the message transfer, but the MessageEvent in logging is not correct here (Because the receiver of the message doesn't know where the message came from in fact; So we can't match the 'MessageEvent(send)' with the 'MessageEvent(recv)' on the same message.):

https://github.com/frankmcsherry/timely-dataflow/blob/d8fd20746a4f205ca1dd079f2a7f05d642a5c0d8/src/dataflow/channels/mod.rs#L44

Hope you will also fix this. Thanks;

Yes, I agree it is strange, and we may want safer interfaces that make it easier to use a logger for one dataflow and harder to drop the default timely logging. :)

The MessageEvent thing is for sure a bug. I'll put that on the queue and try to get to it soon!

Candidate fixes are now available in #213.