All systems are operational

About This Site

This is the technical status page of combahton GmbH which represents past incidents and planned maintenances.

Stickied Incidents

Friday 25th September 2020

Upstream BGP-Sessions Issues with Upstream: fra10.core-backbone.com @ AS33891

Our upstream provider, CoreBackbone (AS33891), is currently experiencing an issue with their core router at their fra10 location which is affecting inbound and outbound traffic.

We've disabled our session with AS33891 for now. Traffic is currently being rerouted via RETN.

See Status - Core Backbone GmbH for more details.

We will resolve this incident once we have received confirmation the issue has been resolved from our upstream provider.

  • The issue has been resolved by Core-Backbone. We have re-enabled import/export for our sessions, routing is back to normal.

  • There is still a outage of fra10.core-backbone.com. We're keeping our sessions to AS33891 on reject for both import/export until we have received a final statement.

    Connectivity is handled over remaining sessions to other upstreams.

  • Maintenance
    Migrating old ssd-gluster Nodes

    We're currently in the process of migrating four old ssd-gluster Nodes, which are used by our cloud infrastructure. As the process is done using glusterfs replace-brick feature, there is no impact to our customers. Migration happens live during operation. Once all old nodes are migrated, we will decomission them. New nodes will provide a lot more performance due to Hardware Raid 10 and NVMe Cache using bcache like our hdd-gluster nodes already do.

    Informational Notification - Interxion FRA-3

    Interxion informed us about new regulations valid on the whole campus area and datacenter buildings.

    In-line with European and National legislation we are required to assess the risks in the workplace to our employees, customers and partners.

    Interxion is reviewing and implementing government advice and considering this as part of the risk assessment process.

    Based on this risk-based approach, Effective as of now, the wearing of mouth-and-nose protection is required for the duration of the stay in data centres and buildings (staircases, visitor points and other areas, including open spaces) where the safety distance cannot be maintained.

    Due to limited supplies globally, Interxion will be unable to provide all customers, visitors or vendors with facial coverings, with the exception of one-time operational emergencies (subject to availability). Therefore, we encourage visitors to conduct their own risk assessments and bring their own appropriate PPE (including face covering) whenever possible.

    Maintenance - core3.ffm3 (Interxion FRA3)

    We are going to carry out maintenance on our Juniper MX480 Router core3.ffm3. The maintenance is intended to increase the capacity on this router in order to later move more traffic over. The exact maintenance window is 03th October 2020 from 02:00 pm till 06:00 pm. We expect fulltable sessions on this router to be offline for a maximum timeframe of about 120 minutes.

    There is no service interruption expected. Traffic flow will be handled by alternative routes. Downstream customers connected to this router should make sure to properly use the provided default route on the primary session or switchover routing to alternative paths.

    Past Incidents

    Saturday 8th February 2020

    No incidents reported

    Friday 7th February 2020

    No incidents reported

    Thursday 6th February 2020

    Core Network Short interruption with packet loss

    At the specified time there was a brief network dropout with packet loss caused by our routing equipment at Interxion Frankfurt. The reason for this is a router that we will restart in the next few days. As a result, no further interruptions are expected.

    Update 17:40 UTC+1: The issue occured a second time. We are rebooting the affected fpc right now.

    Update 17:58 UTC+1: fpc 0 is back online, we are watching the current situation and hope that the restart has resolved the issue. There was just a small amount of packetloss for most customers, a small portion without redundant connectivity was offline for around 10-15 minutes.

    Update 18:57 UTC+1: We have identified a issue with the sflow daemon on fpc 1. Service has been restarted, ddos detection might have reacted slower previously.

    Update 07.02.2020 - 22:58 UTC+1: The same issue occured again. We are looking into it.

    Update 07.02.2020 - 23:07 UTC+1: We have implemented additional measures to resolve the issue and will continue monitoring the operation closely.

    Update 07.02.2020 - 23:25 UTC+1: According to our latest analysis, todays issue was not in relation with yesterdays issue. Juniper JunOS provides a kind of so called "ddos-protection", which main purpose it is to rate-limit certain amounts of traffic towards the control plane. In the current case, the so called "ddos-protection", which is in fact not really what anyone should call ddos-protection, rather some senseless rate-limiting, overreacted and generated a high amount of cpu load while dropping randomly packets between the virtual chassis. We've disabled the mechanism as it's absolutely useless and does not provide any advantage over already implemented loopback firewall filters. This is kinda frustrating as it caused a repeated packetloss within the last two days of about 20-30 seconds. We hope to have the gear under control now. If the issue persists, we will upgrade the firmware image.

    Update 08.02.2020 - 16:20 UTC+1: The issue occured again. This time, both devices are showing FPGA related issues, we are now rebooting them both.

    Update 08.02.2020 - 16:44 UTC+1: Reboot has been carried out. Both devices appear to be stable for now. However, the planned firmware upgrade tomorrow morning will still be carried out.

    Update 08.02.2020 - 21:19 UTC+1: Since the previous measures were unsuccessful, we will take immediate measures and check the situation on site. For this purpose, we will prefer the announced firmware upgrades and, if necessary, undertake a complete replacement of the devices. After the problem has arisen so often in the past few days and all the measures taken have been unsuccessful, we feel compelled to take this step in the interest of our customers. The maintenance work is carried out immediately after arrival in Frankfurt (around 09.02.2020 - 00:00-02:00 UTC + 1).

    Update 09.02.2020 - 01:30 UTC+1: We have been busy with maintenance work since 01:20. For this, the latest firmware was installed and a restart was carried out.

    Update 09.02.2020 - 02:00 UTC+1: All routing instances are properly booted again.

    Update 09.02.2020 - 15:00 UTC+1: The router was again affected. We have now shut down the virtual chassis member in question and will move the uplinks to the remaining device.

    Update 09.02.2020 - 17:53 UTC+1: As the problem still persist, we are now migrating all links to a replacement device, which will impact redundancy for now, but should resolve the issue. Additional replacement gear will be ordered tomorrow by express delivery to ensure further redundancy.
    Update 09.02.2020 - 20:30 UTC+1: Most uplinks has been migrated, working on the remaining ones. Outer connectivity is fully restored to normality.

    Update 09.02.2020 - 21:08 UTC+1: Maintenance work is finished, all equipment appears to be stable.

    Wednesday 5th February 2020

    No incidents reported

    Tuesday 4th February 2020

    No incidents reported

    Monday 3rd February 2020

    No incidents reported

    Sunday 2nd February 2020

    No incidents reported