/// Logs out the current user and prepares the application for a new login session. /// /// Performs server-side logout, cleans up client state, closes all tabs, /// and restarts the Matrix runtime. Reports success or failure via LoginAction. /// /// # Parameters /// - is_desktop - Boolean indicating if the current UI mode is desktop (true) or mobile (false). /// /// # Returns /// - Ok(()) - Logout succeeded (possibly with cleanup warnings) /// - Err(...) - Logout failed with detailed error asyncfnlogout_and_refresh(is_desktop :bool) -> Result<()> { // Collect all errors encountered during the logout process letmut errors = Vec::new(); log!("Starting logout process..."); letSome(client) = get_client() else { let error_msg = "Logout failed: No active client found"; log!("Error: {}", error_msg); Cx::post_action(LogoutAction::LogoutFailure(error_msg.to_string())); returnErr(anyhow::anyhow!(error_msg)); }; if !client.matrix_auth().logged_in() { let error_msg = "Client not logged in, skipping server-side logout"; log!("Error: {}", error_msg); Cx::post_action(LogoutAction::LogoutFailure(error_msg.to_string())); returnErr(anyhow::anyhow!(error_msg)); }
get_sync_service().unwrap().stop().await;
log!("Performing server-side logout..."); match tokio::time::timeout(tokio::time::Duration::from_secs(5), client.matrix_auth().logout()).await { Ok(Ok(_)) => { log!("Server-side logout successful.") }, Ok(Err(e)) => { let error_msg = format!("Server-side logout failed: {}. Please try again later", e); log!("Error :{}", error_msg); Cx::post_action(LogoutAction::LogoutFailure(error_msg.to_string())); returnErr(anyhow::anyhow!(error_msg)); }, Err(_) => { let error_msg = "Server-side logout timed out after 5 seconds. Please try again later"; log!("Error: {}", error_msg); Cx::post_action(LogoutAction::LogoutFailure(error_msg.to_string())); returnErr(anyhow::anyhow!(error_msg)); }, }
// Clean up client state and caches log!("Cleaning up client state and caches..."); CLIENT.lock().unwrap().take(); SYNC_SERVICE.lock().unwrap().take(); TOMBSTONED_ROOMS.lock().unwrap().clear(); IGNORED_USERS.lock().unwrap().clear(); DEFAULT_SSO_CLIENT.lock().unwrap().take(); // Note: Taking REQUEST_SENDER closes the channel sender, causing the async_worker task to exit its loop // This triggers the "async_worker task ended unexpectedly" error in the monitor task, but this is expected during logout REQUEST_SENDER.lock().unwrap().take(); log!("Client state and caches cleared after successful server logout.");
// Desktop UI has tabs that must be properly closed, while mobile UI has no tabs concept. if is_desktop { log!("Requesting to close all tabs in desktop"); let (tx, rx) = oneshot::channel::<bool>(); Cx::post_action(MainDesktopUiAction::CloseAllTabs { on_close_all: tx }); match rx.await { Ok(_) => { log!("Received signal that the MainDesktopUI successfully closed all tabs"); }, Err(e)=> { let error_msg = format!("Close all tab failed {e}"); log!("Error :{}", error_msg); Cx::post_action(LogoutAction::LogoutFailure(error_msg.to_string())); returnErr(anyhow::anyhow!(error_msg)); }, } }
log!("Deleting latest user ID file..."); // We delete latest_user_id here for the following reasons: // 1. we delete the latest user ID such that Robrix won't auto-login the next time it starts, // 2. we don't delete the session file, such that the user could re-login using that session in the future. ifletErr(e) = delete_latest_user_id().await { errors.push(e.to_string()); }
shutdown_background_tasks().await; // Restart the Matrix tokio runtime // This is a critical step; failure might prevent future logins log!("Restarting Matrix tokio runtime..."); if start_matrix_tokio().is_err() { // Send failure notification and return immediately, as the runtime is fundamental let final_error_msg = String::from("Logout succeeded, but Robrix could not re-connect to the Matrix backend. Please exit and restart Robrix"); Cx::post_action(LogoutAction::LogoutFailure(final_error_msg.clone())); returnErr(anyhow::anyhow!(final_error_msg)); } log!("Matrix tokio runtime restarted successfully.");
// --- Final result handling --- if errors.is_empty() { // Complete success log!("Logout process completed successfully."); Cx::post_action(LogoutAction::LogoutSuccess); Ok(()) } else { // Partial success (server logout ok, but cleanup errors) let warning_msg = format!( "Logout completed, but some cleanup operations failed: {}", errors.join("; ") ); log!("Warning: {}", warning_msg); Cx::post_action(LogoutAction::LogoutSuccess); Ok(()) } }
The memory state left by desktop was not properly handled during the logout process in mobile mode
According to the initial report, we can see that the program’s backtrace showed: thread ‘main’ panicked at /Users/alanpoon/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/deadpool-runtime-0.1.4/src/lib.rs:101:22: there is no reactor running, must be called from the context of a Tokio 1.x runtime The matrix-sdk version at that time was 9676daee5ab864088993f869de52ec5d8b72cce9 Question 1 is not very relevant to the current analysis, as the handling omission was fixed in subsequent versions. We’ll mainly discuss issue 2.
The core issues are as follows:
1 2 3
1. The most "dangerous" operation in the entire process is closing the tokio-runtime and then restarting a brand new runtime 2. The panic message appears in deadpool-runtime-0.1.4
Using cargo.lock for analysis, we get a dependency chain: matrix-sdk -> matrix-sdk-sqlite -> rusqlite -> deadpool-sqlite -> deadpool-runtime Since it’s a tokio-related issue, we use tokio-console to analyze the problem Note: Frame 77 in https://github.com/project-robius/robrix/pull/432#discussion\_r2171202111 is related to the room_member_manager in the version at that time. (Alex later updated the code to remove room_member_manager, so this issue no longer exists in subsequent versions) I added a relatively long sleep after shutdown_background and before restart, observing the asynchronous tasks in tokio-console. I found that after calling shutdown_background, there were still many deadpool-related asynchronous tasks existing.
This is consistent with the tokio documentation, as shutdown_background will close the tokio-runtime but won’t wait for the asynchronous tasks within it to finish. So we conclude that this panic occurs because we closed the tokio-runtime but the asynchronous tasks within it still exist, and when there’s no runtime, the asynchronous tasks still try to execute, causing the panic.
When we CLIENT.lock().unwrap().take();, the matrix client starts to destruct, but before shutdown_background_tasks, there’s no guarantee that all asynchronous tasks in the client have been fully processed, leading to panic.
Since the issue occurs in matrix-sdk, I read the matrix API trying to find corresponding cleanup logic to reclaim deadpool-runtime in advance, as the crash happens in deadpool-runtime. Unfortunately, matrix doesn’t proactively provide an API to reclaim deadpool-runtime.
Design of the Logout State Machine
Note: Strictly speaking, the design of the state machine is not directly related to the crash, but the handling of this crash issue benefits from the design of the logout state machine.
pubasyncfnclean_app_state(config: &LogoutConfig) -> Result<()> { // Clear resources normally, allowing them to be properly dropped // This prevents memory leaks when users logout and login again without closing the app CLIENT.lock().unwrap().take(); log!("Client cleared during logout");
SYNC_SERVICE.lock().unwrap().take(); log!("Sync service cleared during logout");
REQUEST_SENDER.lock().unwrap().take(); log!("Request sender cleared during logout");
// Only clear collections that don't contain Matrix SDK objects TOMBSTONED_ROOMS.lock().unwrap().clear(); IGNORED_USERS.lock().unwrap().clear(); ALL_JOINED_ROOMS.lock().unwrap().clear();
let (tx, rx) = oneshot::channel::<bool>(); Cx::post_action(LogoutAction::CleanAppState { on_clean_appstate: tx });
match tokio::time::timeout(config.app_state_cleanup_timeout, rx).await { Ok(Ok(_)) => { log!("Received signal that app state was cleaned successfully"); Ok(()) } Ok(Err(e)) => Err(anyhow!("Failed to clean app state: {}", e)), Err(_) => Err(anyhow!("Timed out waiting for app state cleanup")), } }
Now, the reclamation of tokio asynchronous tasks, especially regarding deadpool-runtime in matrix, is handled between clean_app_state and shutdown_background_tasks. In the new state machine, we have more operations between CLIENT.lock().unwrap().take(); (start destructing asynchronous tasks in matrix) and shutdown_background_tasks, giving the program more time for destruction. Note: We still don’t have an API to actively handle deadpool-runtime in matrix.
Now, there are no more panics related to deadpool in the code.
pubasyncfnclean_app_state(config: &LogoutConfig) -> Result<()> { // Clear resources normally, allowing them to be properly dropped // This prevents memory leaks when users logout and login again without closing the app CLIENT.lock().unwrap().take(); log!("Client cleared during logout");
SYNC_SERVICE.lock().unwrap().take(); log!("Sync service cleared during logout");
REQUEST_SENDER.lock().unwrap().take(); log!("Request sender cleared during logout");
// Only clear collections that don't contain Matrix SDK objects TOMBSTONED_ROOMS.lock().unwrap().clear(); IGNORED_USERS.lock().unwrap().clear(); ALL_JOINED_ROOMS.lock().unwrap().clear();
// match tokio::time::timeout(config.app_state_cleanup_timeout, rx).await { // Ok(Ok(_)) => { // log!("Received signal that app state was cleaned successfully"); // Ok(()) // } // Ok(Err(e)) => Err(anyhow!("Failed to clean app state: {}", e)), // Err(_) => Err(anyhow!("Timed out waiting for app state cleanup")), // }
Ok(()) }
If we comment out the code as shown above, canceling this time-consuming operation, and then logout, we can still see panics related to deadpool-runtime.
About Shutdown and Leak
The above is an analysis of logout. As an exit function, I feel obligated to address issues in shutdown (closing the process).
Through the above analysis, I have reached the conclusion:
The deadpool-runtime panic that occurs during the exit process is due to not fully ending asynchronous tasks before closing the runtime. Or, it’s entirely an issue of destruction order.
In the logout state machine, because we adjusted the order of the state machine, we gave more time for the client to destruct when it’s being destructed, avoiding the deadpool-runtime panic. However, we don’t have a state machine to perform such a series of operations during shutdown, and we don’t have an API to pre-release deadpool-runtime in the matrix client. So if we directly shutdown robrix, this crash information will point to the robrix program.
Therefore, I believe that since we have no way to release asynchronous tasks in the program, we might as well directly forget them, especially forgetting the tokio-runtime. Since the program is about to shutdown, forgetting might be acceptable.
This design is the core idea of using leak in shutdown.
Current Situation
The current situation is a bit interesting. In the current version, I tried:
// Clear user profile cache first to prevent thread-local destructor issues // This must be done before leaking the tokio runtime clear_all_caches(); log!("Cleared user profile cache");
// Set logout in progress to suppress error messages LOGOUT_IN_PROGRESS.store(true, Ordering::Relaxed);
// Immediately take and leak all resources to prevent any destructors from running // This is a controlled leak at shutdown to avoid the deadpool panic
// Take the runtime first and leak it leak_runtime();
// Take and leak the client leak_client();
// Take and leak the sync service leak_sync_service();
// Take and leak the request sender leak_request_sender();
// Don't clear any collections or caches as they might contain references // to Matrix SDK objects that would trigger the deadpool panic log!("Shutdown cleanup completed - all resources leaked to prevent panics"); }
This code is in src/app.rs. I now try not to call this method, which means not actively leaking these resources during shutdown, and the program still doesn’t crash. (Note that I did encounter crash errors in previous versions)
Current safe process: logout → clean_app_state() → Matrix SDK core resources already cleared → clear_all_caches() → safe shutdown → do nothing → operating system reclaims resources → safe
Key contribution of the state machine
Separated the handling logic of logout and shutdown
Ensured correct cleanup order (clear dependencies first, then clear dependents)
Provided clear timing guarantees (perform dangerous operations only when it’s safe)
Leaking is indeed a dangerous operation. Of course, the current situation suggests that the previous code might have been over-protective, and we might consider removing the related code.