Skip to content

Review masking and add traces when things don't cancel timely #2768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 11, 2022
3 changes: 1 addition & 2 deletions ghcide/src/Development/IDE/Core/FileExists.hs
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,7 @@ modifyFileExists :: IdeState -> [(NormalizedFilePath, FileChangeType)] -> IO ()
modifyFileExists state changes = do
FileExistsMapVar var <- getIdeGlobalState state
-- Masked to ensure that the previous values are flushed together with the map update
-- update the map
mask_ $ join $ atomicallyNamed "modifyFileExists" $ do
join $ mask_ $ atomicallyNamed "modifyFileExists" $ do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't make sense to me. I think this just makes the mask apply to the atomic execution of the STM block, which is already atomic. Previously it ensured that we would perform the effects of the returned action as well without being interrupted. But now I think it doesn't do that, and indeed maybe does nothing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still prevents the STM transaction from being interrupted by an edit, which is important to accurately track file dirtiness.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can end up not calling recordDirtyKeys, since that happens in the IO action afterwards. Is that important? Maybe the comment could say.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IO is just logging, recordDirtyKeys mutates the collection inside the STM transaction. That said I appreciate how this is confusing and perhaps the mask_ was better at the top level for clarity

forM_ changes $ \(f,c) ->
case fromChange c of
Just c' -> STM.focus (Focus.insert c') f var
Expand Down
24 changes: 15 additions & 9 deletions ghcide/src/Development/IDE/Core/Shake.hs
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,6 @@ import Development.IDE.Types.Options
import Development.IDE.Types.Shake
import qualified Focus
import GHC.Fingerprint
import GHC.Generics
import HieDb.Types
import Ide.Plugin.Config
import qualified Ide.PluginUtils as HLS
Expand All @@ -173,6 +172,7 @@ data Log
= LogCreateHieDbExportsMapStart
| LogCreateHieDbExportsMapFinish !Int
| LogBuildSessionRestart !String ![DelayedActionInternal] !(HashSet Key) !Seconds !(Maybe FilePath)
| LogBuildSessionRestartTakingTooLong !Seconds
| LogDelayedAction !(DelayedAction ()) !Seconds
| LogBuildSessionFinish !(Maybe SomeException)
| LogDiagsDiffButNoLspEnv ![FileDiagnostic]
Expand All @@ -192,6 +192,8 @@ instance Pretty Log where
, "Action Queue:" <+> pretty (map actionName actionQueue)
, "Keys:" <+> pretty (map show $ HSet.toList keyBackLog)
, "Aborting previous build session took" <+> pretty (showDuration abortDuration) <+> pretty shakeProfilePath ]
LogBuildSessionRestartTakingTooLong seconds ->
"Build restart is taking too long (" <> pretty seconds <> " seconds)"
LogDelayedAction delayedAction duration ->
hsep
[ "Finished:" <+> pretty (actionName delayedAction)
Expand Down Expand Up @@ -322,7 +324,7 @@ getVirtualFile nf = do
-- Take a snapshot of the current LSP VFS
vfsSnapshot :: Maybe (LSP.LanguageContextEnv a) -> IO VFS
vfsSnapshot Nothing = pure $ VFS mempty ""
vfsSnapshot (Just lspEnv) = LSP.runLspT lspEnv $ LSP.getVirtualFiles
vfsSnapshot (Just lspEnv) = LSP.runLspT lspEnv LSP.getVirtualFiles


addIdeGlobal :: IsIdeGlobal a => a -> Rules ()
Expand Down Expand Up @@ -596,7 +598,7 @@ shakeOpen recorder lspEnv defaultConfig logger debouncer

dirtyKeys <- newTVarIO mempty
-- Take one VFS snapshot at the start
vfs <- atomically . newTVar =<< vfsSnapshot lspEnv
vfs <- newTVarIO =<< vfsSnapshot lspEnv
pure ShakeExtras{..}
shakeDb <-
shakeNewDatabase
Expand Down Expand Up @@ -683,7 +685,7 @@ shakeRestart recorder IdeState{..} reason acts =
shakeSession
(\runner -> do
let log = logWith recorder
(stopTime,()) <- duration (cancelShakeSession runner)
(stopTime,()) <- duration $ logErrorAfter 10 recorder $ cancelShakeSession runner
res <- shakeDatabaseProfile shakeDb
backlog <- readTVarIO $ dirtyKeys shakeExtras
queue <- atomicallyNamed "actionQueue - peek" $ peekInProgress $ actionQueue shakeExtras
Expand All @@ -706,6 +708,11 @@ shakeRestart recorder IdeState{..} reason acts =
-- See https://github.com/haskell/ghcide/issues/79
(\() -> do
(,()) <$> newSession recorder shakeExtras shakeDb acts reason)
where
logErrorAfter :: Seconds -> Recorder (WithPriority Log) -> IO () -> IO ()
logErrorAfter seconds recorder action = flip withAsync (const action) $ do
sleep seconds
logWith recorder Error (LogBuildSessionRestartTakingTooLong seconds)

notifyTestingLogMessage :: ShakeExtras -> T.Text -> IO ()
notifyTestingLogMessage extras msg = do
Expand Down Expand Up @@ -1100,7 +1107,7 @@ defineEarlyCutoff' doDiagnostics cmp key file old mode action = do
-- No changes in the dependencies and we have
-- an existing successful result.
Just (v@(Succeeded _ x), diags) -> do
ver <- estimateFileVersionUnsafely state key (Just x) file
ver <- estimateFileVersionUnsafely key (Just x) file
doDiagnostics (vfsVersion =<< ver) $ Vector.toList diags
return $ Just $ RunResult ChangedNothing old $ A v
_ -> return Nothing
Expand All @@ -1121,7 +1128,7 @@ defineEarlyCutoff' doDiagnostics cmp key file old mode action = do
\(e :: SomeException) -> do
pure (Nothing, ([ideErrorText file $ T.pack $ show e | not $ isBadDependency e],Nothing))

ver <- estimateFileVersionUnsafely state key res file
ver <- estimateFileVersionUnsafely key res file
(bs, res) <- case res of
Nothing -> do
pure (toShakeValue ShakeStale bs, staleV)
Expand All @@ -1147,12 +1154,11 @@ defineEarlyCutoff' doDiagnostics cmp key file old mode action = do
estimateFileVersionUnsafely
:: forall k v
. IdeRule k v
=> Values
-> k
=> k
-> Maybe v
-> NormalizedFilePath
-> Action (Maybe FileVersion)
estimateFileVersionUnsafely state _k v fp
estimateFileVersionUnsafely _k v fp
| fp == emptyFilePath = pure Nothing
| Just Refl <- eqT @k @GetModificationTime = pure v
-- GetModificationTime depends on these rules, so avoid creating a cycle
Expand Down
65 changes: 46 additions & 19 deletions hls-graph/src/Development/IDE/Graph/Internal/Database.hs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TupleSections #-}

module Development.IDE.Graph.Internal.Database (newDatabase, incDatabase, build, getDirtySet, getKeysAndVisitAge) where

Expand All @@ -32,14 +33,15 @@ import Data.IORef.Extra
import Data.Maybe
import Data.Traversable (for)
import Data.Tuple.Extra
import Debug.Trace (traceM)
import Development.IDE.Graph.Classes
import Development.IDE.Graph.Internal.Rules
import Development.IDE.Graph.Internal.Types
import qualified Focus
import qualified ListT
import qualified StmContainers.Map as SMap
import System.Time.Extra (duration, sleep)
import System.IO.Unsafe
import System.Time.Extra (duration)

newDatabase :: Dynamic -> TheRules -> IO Database
newDatabase databaseExtra databaseRules = do
Expand Down Expand Up @@ -120,7 +122,7 @@ builder db@Database{..} stack keys = withRunInIO $ \(RunInIO run) -> do
pure (id, val)

toForceList <- liftIO $ readTVarIO toForce
let waitAll = run $ mapConcurrentlyAIO_ id toForceList
let waitAll = run $ waitConcurrently_ toForceList
case toForceList of
[] -> return $ Left results
_ -> return $ Right $ do
Expand Down Expand Up @@ -170,6 +172,10 @@ compute db@Database{..} stack key mode result = do
deps | not(null deps)
&& runChanged /= ChangedNothing
-> do
-- IMPORTANT: record the reverse deps **before** marking the key Clean.
-- If an async exception strikes before the deps have been recorded,
-- we won't be able to accurately propagate dirtiness for this key
-- on the next build.
void $
updateReverseDeps key db
(getResultDepsDefault [] previousDeps)
Expand Down Expand Up @@ -224,7 +230,8 @@ updateReverseDeps
-> [Key] -- ^ Previous direct dependencies of Id
-> HashSet Key -- ^ Current direct dependencies of Id
-> IO ()
updateReverseDeps myId db prev new = uninterruptibleMask_ $ do
-- mask to ensure that all the reverse dependencies are updated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I wonder if we could avoid by pushing the scope of the STM transaction out a bit?

Copy link
Collaborator Author

@pepeiborra pepeiborra Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STM transactions can still be interrupted by exceptions, so the question is how badly do we want to record the reverse deps. The answer is very badly - failure to do so will lead to unsoundness, specially if this is the first build.

EDIT: oh, correcting myself, since we don't mark the build as done until after the reverse keys have updated, I don't think that the mask is needed at all.

updateReverseDeps myId db prev new = do
forM_ prev $ \d ->
unless (d `HSet.member` new) $
doOne (HSet.delete myId) d
Expand Down Expand Up @@ -252,20 +259,27 @@ transitiveDirtySet database = flip State.execStateT HSet.empty . traverse_ loop
next <- lift $ atomically $ getReverseDependencies database x
traverse_ loop (maybe mempty HSet.toList next)

-- | IO extended to track created asyncs to clean them up when the thread is killed,
-- generalizing 'withAsync'
--------------------------------------------------------------------------------
-- Asynchronous computations with cancellation

-- | A simple monad to implement cancellation on top of 'Async',
-- generalizing 'withAsync' to monadic scopes.
newtype AIO a = AIO { unAIO :: ReaderT (IORef [Async ()]) IO a }
deriving newtype (Applicative, Functor, Monad, MonadIO)

-- | Run the monadic computation, cancelling all the spawned asyncs if an exception arises
runAIO :: AIO a -> IO a
runAIO (AIO act) = do
asyncs <- newIORef []
runReaderT act asyncs `onException` cleanupAsync asyncs

-- | Like 'async' but with built-in cancellation.
-- Returns an IO action to wait on the result.
asyncWithCleanUp :: AIO a -> AIO (IO a)
asyncWithCleanUp act = do
st <- AIO ask
io <- unliftAIO act
-- mask to make sure we keep track of the spawned async
liftIO $ uninterruptibleMask $ \restore -> do
a <- async $ restore io
atomicModifyIORef'_ st (void a :)
Expand All @@ -284,27 +298,40 @@ withRunInIO k = do
k $ RunInIO (\aio -> runReaderT (unAIO aio) st)

cleanupAsync :: IORef [Async a] -> IO ()
cleanupAsync ref = uninterruptibleMask_ $ do
asyncs <- readIORef ref
-- mask to make sure we interrupt all the asyncs
cleanupAsync ref = uninterruptibleMask $ \unmask -> do
asyncs <- atomicModifyIORef' ref ([],)
-- interrupt all the asyncs without waiting
mapM_ (\a -> throwTo (asyncThreadId a) AsyncCancelled) asyncs
mapM_ waitCatch asyncs
-- Wait until all the asyncs are done
-- But if it takes more than 10 seconds, log to stderr
unless (null asyncs) $ do
let warnIfTakingTooLong = unmask $ forever $ do
sleep 10
traceM "cleanupAsync: waiting for asyncs to finish"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to use Debug.Trace here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are seeing this trace, then most likely hls-graph is broken.

Plus, I didn't want to thread a logger around....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're in IO, you could just print to stderr. It's a small difference, but I do think not having Debug.Trace in production code is good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point

withAsync warnIfTakingTooLong $ \_ ->
mapM_ waitCatch asyncs

data Wait
= Wait {justWait :: !(IO ())}
| Spawn {justWait :: !(IO ())}

data Wait a
= Wait {justWait :: !a}
| Spawn {justWait :: !a}
deriving Functor
fmapWait :: (IO () -> IO ()) -> Wait -> Wait
fmapWait f (Wait io) = Wait (f io)
fmapWait f (Spawn io) = Spawn (f io)

waitOrSpawn :: Wait (IO a) -> IO (Either (IO a) (Async a))
waitOrSpawn :: Wait -> IO (Either (IO ()) (Async ()))
waitOrSpawn (Wait io) = pure $ Left io
waitOrSpawn (Spawn io) = Right <$> async io

mapConcurrentlyAIO_ :: (a -> IO ()) -> [Wait a] -> AIO ()
mapConcurrentlyAIO_ _ [] = pure ()
mapConcurrentlyAIO_ f [one] = liftIO $ justWait $ fmap f one
mapConcurrentlyAIO_ f many = do
waitConcurrently_ :: [Wait] -> AIO ()
waitConcurrently_ [] = pure ()
waitConcurrently_ [one] = liftIO $ justWait one
waitConcurrently_ many = do
ref <- AIO ask
waits <- liftIO $ uninterruptibleMask $ \restore -> do
waits <- liftIO $ traverse (waitOrSpawn . fmap (restore . f)) many
-- mask to make sure we keep track of all the asyncs
waits <- liftIO $ uninterruptibleMask $ \unmask -> do
waits <- liftIO $ traverse (waitOrSpawn . fmapWait unmask) many
let asyncs = rights waits
liftIO $ atomicModifyIORef'_ ref (asyncs ++)
return waits
Expand Down