7.28.2009

Metro's history of failure: Part II

The Writing on the Wall

In June of 2005, three fully-loaded trains nearly collided under the Potomac River between the Rosslyn and Foggy Bottom-GWU stations. In that incident, the computerized control system failed to detect a stopped train along a segment of track. Operators of the two trains behind the stopped train had to engage the emergency "mushroom" brake to prevent a collision. The first train stopped a mere 35 feet from the back of the stationary train, and the second following train stopped with a mere 12 feet of space. The now-retired operator of the first following train, Larry Mitchell told the Washington Examiner, "I shudder to think of what might have happened, we were under the Potomac, three trains fully loaded. The casualty rate would have been enormous."

Metro officials blamed the 2005 near-miss on the failure of a communications cable between the Rossyn and Foggy Bottom stations. This failure caused a 1,000 foot stretch of track to go dark, or to stop reporting the location of trains. Metro has not commented on how this failure occurred without detection prior to a near-collision. Little technical information has been made public regarding this incident.

In any event, the 2005 problem revealed a critical flaw in the Automatic Train Control system. If the track sensors or communications lines failed, there was no backup. On a dark segment of track, a train operating in automatic mode would proceed at full speed into a parked train. The only line of defense would be an alert operator hitting an emergency brake with enough distance to stop.

When asked about the relevence of the 2005 incident to the 2009 Red Line crash, Metro spokesman Steven Taubenkibel refused to compare the two incidents, citing the on-going investigation. Taubenkibel also noted that Metro General Manager John Catoe did not work for Metro in 2005, and may have been unaware of the incident.

The NTSB has made public that the circuit at the site of the June 22, 2009 crash had been malfunctioning as long ago as 2007. From a July 24 article in the Washington Post:
Federal investigators found that the circuit began "fluttering," or intermittently malfunctioning, after Metro crews installed a device known as an impedance bond, also called a Wee-Z bond, at the circuit in December 2007, according to a safety board advisory issued Thursday. Metro has been installing new bonds across the 106-mile railroad as part of a project to boost power so the agency can run more eight-car trains, which consume more electricity than shorter trains. Each track circuit has two Wee-Z bonds.

The fluttering indicated a problem with the circuit, according to the data examined by the NTSB. After Metro crews replaced the second Wee-Z bond in the same circuit June 17, the circuit deteriorated to the most dangerous stage: It intermittently failed to detect the presence of a train. Five days later, a train idling in that circuit outside the Fort Totten Station was hit from behind by another train.
It is unclear who at Metro was aware of this problem prior to the accident. It is likely the NTSB report on the June 22 accident will focus on two specific areas, what caused the circuit to malfunction and why Metro failed to respond to that malfunction prior to the crash.

Slices of "Swiss Cheese"

If one reviews the NTSB accident reports from the 1996 crash at Shady Grove, and the 2004 crash at Woodley Park, a common thread emerges. While the immediate causes of the two crashes were different, the NTSB notes a growing concern about Metro's organizational structure.

The Metrorail system is a highly complex and tightly-coupled system. Trains operate in close proximity to one another, and there is little room for error. The safety of customers and employees relies on a computerized control system. In organizational management, there is safety model referred to as the "Swiss Cheese model." This posits that a complex system is made up of parts (slices of cheese) that can each have points of failure (holes). By having the proper arrangement of slices and a minimal number of holes, safety can be maintained in a complicated environment. Accidents occur when the holes in the slices align.

In a system such as Metrorail, the root cause of an accident can likely be traced back to both active and latent failures. If any one of these failures had been addressed prior to the incident, it is likely the crash could have been avoided. Where we go from here depends on what the active and latent failures were, and which can be prevented in the future.

Following the 1996 incident, at the recommendation of the NTSB, Metro made some organizational changes. These changes were made to improve the emphasis on safety. However, following the 2004 Woodley Park incident, the structure at Metro was changed yet again, removing the safety department's direct accecss to the General Manager. The NTSB had the following comments:
During the investigation of the January 6, 1996, accident at the Shady Grove station, the Safety Board identified employee concerns about WMATA’s organizational structure, specifically, a perceived lack of communication and a sense of information isolation. These concerns were addressed by a WMATA safety review committee, which recommended that WMATA change its organizational structure to have the safety department report directly to the general manager (GM). This recommendation was subsequently adopted and implemented, and WMATA’s safety department began reporting directly to the GM.

WMATA’s organizational structure was not an issue in the November 3, 2004, accident at the Woodley Park station. However, following the 2004 accident, WMATA restructured its organization again, reverting back to the safety department having a disconnected responsibility and accountability reporting chain. In effect, this restructuring maneuver rescinded the direct reporting link between the safety department and the GM that had been established as result of the Shady Grove accident. In a letter to WMATA, dated March 31, 2005, the Tri-State Oversight Committee expressed concern about the transit authority’s reorganization, which eliminated the safety department’s direct access to the GM. This postaccident reorganization could recreate the systemic information isolation that existed within WMATA prior to the Shady Grove accident, which in turn could inhibit serious safety problems from being identified or adequately addressed.
Furthermore, on July 14, 2009, Peter M. Rogoff, Administrator of the Federal Transit Administration testified before the House Subcommittee on Federal Workforce, Postal Service, and the District of Columbia. In his testimony, he reiterated both the NTSB and the FTA's concerns about Metro's organization.
FTA has conducted several SSO program audits of TOC since Part 659 went into effect on January 1, 1997. The most recent audit was conducted in October 2007. Previous audits took place in 2000 and 2005. FTA also conducted a Safety Review in 1997. The 2007 audit was conducted as part of FTA’s three-year audit cycle for all 27 SSO agencies in the audit program. During this audit, while on-site at TOC and WMATA, FTA also reviewed the progress made by TOC and WMATA to address two findings that were still open from FTA’s 2005 SSO Program audit of TOC. In addition, FTA used this opportunity to assess WMATA’s response to Safety Recommendation R-06-4 from the National Transportation Safety Board (NTSB), which addressed the adequacy of WMATA’s organizational structure and its ability to effectively identify safety issues. Prior to the Woodley Park-Zoo accident, the WMATA Safety Department reported to the General Manager through a Deputy. Shortly after, WMATA changed its organization so that the Chief Safety Officer and head of System Safety and Risk Management (SSRM) was a direct report to the General Manager. NTSB correspondingly classified this recommendation as “Closed – Acceptable Action”.

However, in recent months, WMATA has re-organized the Chief Safety Officer position to report to the Chief Administrative Officer, who reports to the General Manager. FTA asked the TOC to follow up with WMATA. WMATA has assured the TOC that the organizational changes do not adversely affect safety and that the “visibility and importance of the safety department will not diminish”. FTA continues to view the NTSB recommendation as a sound safety model and the current structure at WMATA causes us concern.
In the case of the June 22 crash, there was a host of latent failures. A lack of dedicated funding for Metro has resulted in decades of financial problems. The District's lack of representation in Congress has likely been a contributing factor to this situation as well. The Metrorail system was designed with Automatic Train Control in mind, which drastically reduced the operator's role in maintaining safety. A rigid top-down organizational structure at Metro has consistently made it difficult to respond quickly to safety concerns in an complex and ever-changing environment. The list goes on. All of these factors contributed in part to the crash. The active failure, the malfunction of the circuit between Fort Totten and Takoma was merely the last slice of cheese.

In the next part of this series, I will address in more detail Metro's financial situation, and how this has impacted an emphasis on safety. I know this is intense material, and I'm not adding any snark to lighten it up. I will have some more humorous/creative posts later this week. For now, though, I want to present this in a serious manner. Thanks.

3 comments:

  1. Don't even bother. You don't have a creative bone in your brittle body.

    ReplyDelete
  2. yeah, its not like you're a brilliant creative artist such as David Schwimmer.

    mat and his ilk think that guy is a genius.

    ReplyDelete
  3. I found the last two posts interesting and relevant. I'm all for fun and snarky commentary, but aren't there times we can stop watching the reality tv show and talk about something serious? It never hurt anyone...

    ReplyDelete