Make Service Fault Transparent

2017-05-07  本文已影响0人  phy25

This article is an English one, because I really need to work on the language. Sorry if it is not easy to understand.

A Summary to What's Happening Recently

Recently in my campus, IT service is very unstable.

"Totally nailed the fix"

Why Fault Needs to Be Transparent

As you can see, suddenly all issues came into being, but they will not happen because of no reason. Anyway apart from solving issues, making the solving process transparent is also important. Why?

Because, Information technology is becoming essential to our life, just like water and electricity supplies. To this point, it is not anything "advanced" any more, for which people get high expectations to that. What's more, IT is developing fast (counting with years, not decades), thus people's expectations are growing fast with it.

It's quite a challenge for campus IT service to catch up with that. But firstly, they are working on that. If they don't speak, people thinking the service essential will imagine "It's just messing up my life, and they just don't try hard to solve that". This is surely a gap between the two's understanding.

"Why you leave the esculator unfixed for ONE MONTH!"

P.S. Some good man has reminded me that, sometimes there will be staffs not working at all in the "old system". But I guess in my campus they work hard.

Another problem if IT service is not transparent "in time" is that, users don't know whether they need to report or wait. Of course most of us will silently wait for the fix - most of us are busy, right? But what if the staffs don't know the issue at all? We don't know whether they know the issue, and most people won't trust others forever and believe "they must be fixing it now". This might be a more misleading situation, which causes user dissatisfaction.

I can't think of any disadvantage of being actively transparent to faults for a hard-working public service, so I strongly believe this theory.

Ah, yes, I have to highlight that what I mean here about transparency, is "instant transparency". Something this brings one problem: when you realize that you identifed a wrong cause that you published before, you have to recall the previous statement, which brings confusion. If everybody is wise and realizes that people can make mistakes, this is not a problem at all, and you can just leave your previous "wrong" statement there.

In Staytus's demo, an issue became red again from `Monitoring` status

Tool and Platform is Not That Important

People may argue that, "we might not have the right tool to do that for now". Probably the tool doesn't fit, but when you have the idea to do the right thing, tools and platforms are not a problem.

A good example in my campus is the student financial service. They always use forums to answer students' scholarship questions. Though the forum they choose is not that popolar, and I guess some scholarship project process information can be formatted in a nicer single page, but firstly they choose to be transparent.

IT service, on the contrary, is:

Thus a digital way might be a better way to provide transparency.

But what if "the digital way" is faulty? We can put the solution on a school server that hardly fails (probably standalone) and connects with both Intranet and Internet. And a better solution might be prepare for the worst: Choose a third-party (VPS outside Intranet) or public service (Weibo or WeChat), and hope that it won't fail when our infrastructure fails. Unreliable as it seems, you are winning a lottery if everything fails (maybe once in a lifetime?), and you won't hestitate to do the physical announcements.

Yeah, maybe your physical announcement is not enough...

A Blueprint Specifically for IT Service

When everybody is busy, this kind of customer service cannot be depended only by "I contacted you and you talk to me". Some self-service theory can be incorporated here: Make status updates available to everyone. When they need help, they can check on the updates, rest assured, and calmly wait.

I heard that the support ticket systems for IT services is being considered now, but now the "status page" thing is more important.

We have talked about the platforms, right? We will look into them one by one.

As I said above, when fault happens, users have motivations to "check status". Thus frequent, up-to-date, no-need-to-push-to-everybody status update looks good.

The conclusion is that, it's best to have

Of course this have some technology expenses, thus choosing a existing public service (in the short term) is fine, too.

"How to publish" is easy: we can formulate some statement templates (like the well-known investgating/identified/monitoring/resolved model), and when being used, add details to the statements. And we can form rules of updates, to keep transparency, like at lease publish one update every X hours.

Pre-translated templates in Google's statusboard; notice the "we have additional English explaination" sentences

We also need someone to publish messages (I know in China this is a bigger problem). A good technical writer should be recurited. But I think it can be achieved by part-time job by students: they signed some confidentiality agreement and joined the working discussion group, and if any fault happens, they are responsible to publish the situation according to the template and the discussion group's conversations. Yeah, I bet these conversations sometimes contain password or something else, so confidentiality is important.

Or if the tech staffs can do updates themselves, that's fine (but that's really too busy for them).

"The well-known modal" in Staytus

Choosing a open-source solution

As a student, who don't have that much money, I like open-source a lot. For this status page thing, of course I would like to solve it by open-source stuffs.

Actually according to my recongnition, there is no such "status-page service" in China. For example, Leancloud built the status page themselves. The "international" cloud version of this seems not good here, because it might be very slow. So we have to count on self-hosted, open-source ones.

In my opinion, for a status page of school IT service, the most important thing is "update". The overall "status indicator" is not that important.

Yes, this Apple style doesn't fit

After some research, the dynamic, usable, being maintained open-source status solutions are not that many.

Cachet (dev version)

I know some of you hate databases. Using a static page generator is a good idea. These solutions exist, but they just seem not that perfect, and to form the workflow is a hard work.

I hope these solutions can be helpful. Though, the most important thing is still what you are trying to achieve.

上一篇 下一篇

猜你喜欢

热点阅读