seperis: (Default)
seperis ([personal profile] seperis) wrote2009-04-13 09:20 pm

amazon and codefixes - oh, this is something i might know something about!

A possible explanation, gakked from [livejournal.com profile] trobadora:

AmazonFail: An Inside Look at What Happened

Amazon managers found that an employee who happened to work in France had filled out a field incorrectly and more than 50,000 items got flipped over to be flagged as "adult," the source said. (Technically, the flag for adult content was flipped from 'false' to 'true.')


Note: If they are telling the truth about what happened, this applies. And actually, it would apply if they lied, but worse. One error is one thing, but if this was a deliberate system-wide build that made the change, pretty much the same thing applies, but with less sympathy.

My expertise is not expertise, it is anecdata, but it's also ten builds and fifty emergency releases of professional anecdata, so take that as you will.

I am a professional tester because at some point, it occurred to people that things worked better when there was a level of testing that was specifically designed to mimic the experiences of the average user with a change to a program. Of course, they didn't use average users, they used former caseworkers and programmers, but the point stands.



I'm a professional program tester and do user acceptance, which means I am the last line of defense for users before we release a change to the program, major and minor. It's a web-based program that with three very idiotic ways to interface with it online for a user and about fifty for other agencies to do automatically, and I won't go into our vendor interfaces because it hurts me inside. I am one of thirty user acceptance testers for this program, because it's huge and covers a massive number of things and interfaces with federal and state level agencies outside of our own internal agencies. I test things straight from the hands of coders in emergency releases and also after they've gone through two other levels of testing in our quarterly builds.

This does ring true to my experience when something just goes stupid. And when I say stupid, I mean, someone accidentally killed off pieces of welfare policy with a misflag once, and that's not even the stupidest thing I've had to test when the program was built and is still coded modularly and the coders are in different parts of the country and sometimes at home in India when working on this. And none of them ever know what anyone else is doing.

While I have no idea what amazon's model looks like, to do a rollback on a change for us, even a minor one, it goes like this:

1.) Report
2.) Reproduction in one of our environments.
3.) Code fix and discussion and so many meetings, God. (emergency releases may not go through this.)
4.) DEV environment 1 (theoretical construct of the program, works very well, nothing like the real thing)
5.) DEV environment 2 (closer to the actual program, but not by much)(sometimes do not use both Dev 1 and Dev 2 both)
6.) SIT (sometimes skipped for emergency releases) (I have issues with their methodology.)
7.) User Acceptance (me! And some other people, somewhat close to field conditions with database as of last mass update, usually two to three months before)
8.) Prodfix (optional) (also me! And some other people, almost perfect mirror of field conditions with full database)

If it's really desperate, it goes to prodfix instead of or in addition to User Acceptance, which is the only environment we have that nearly-perfectly mirrors live field conditions and is fully updated with our field database as of five o'clock COB the day before. For me to do a basic test, they give me a (really horrifyingly short) version of events and if I get lucky, I get to see screenshots of the problem in progress.

[If I win the lottery, someone uploaded the specific patches themselves for me to look at, and I get to see what is going on pre-compiling. That has happened once. I did not take advantage of it. I kick myself sometimes.]

Usually, I get a fifth hand account that's gone through eight other people on what went wrong and what function I'm supposed to test and in what order to do it in. Depending on severity, I have four hours to four days to write the test (or several tests, or several variations of the same test for different user conditions, or different input conditions), send it to the person who owns the defect, have them check it, then I run the test in full, then fail or pass it. Or run it in full, fail or pass, then run it in prodfix, fail or pass it.

[Sometimes, I have a coder call me and we both stare in horror at our lot in life when both of us really don't know what the hell went wrong and hope to God this didn't break more things.]

The fastest I've ever seen an emergency release fix go through is three days from report to implementation, and at least once, we had a massive delay when they were too eager and crashed our database because the rollback didn't match the new information entered into the system since the problem started.

[And since this is welfare and under federal jurisdiction, the state gets fined by the feds when we cannot issue benefits correctly or have egrerious errors. Feds are really, really politely nasty about this sort of thing. And OIG, who audits us for errors, hates this program like you would not believe. To say there is motivation for speed is to understate the case.]

The program I test is huge, and terrifyingly complicated, and unevenly coded, and we can easily crash the servers for incredibly stupid small-seeming things. Amazon is about a hundred times larger. We do four major builds and four minor (just like major, just with a different name) per year, plus upwards of thirty emergency releases between builds. Our releases aren't live but overnight batched when the program goes to low-use after 8 PM, so we have some leeway if something goes dramatically bad or our testing isn't thorough enough. Which you know, that also happens. Amazon is always up and while it has the same constant database updates we do, I'm betting also has more frequent normal code updates, both automatic and human initiated.

If this is actually what happened, then the delay in fixing it makes sense, at least in my experience. Unless they release live code without testing it in an environment that is updated to current database conditions, which um, wow, see the thing where we crashed the state servers? The state is cheap and they suck and even they don't try to do even a minor release without at least my department getting to play with it first and give yea or nay because of that.



Short version: this matches my testing experience and also tells you more than you ever wanted to know about my daily life and times. YMMV for those who have a different model for code releases and updates.

And to add, again, if this is true, I am seriously feeling for the tech dept right now. Having to do unplanned system-wide fixes sucks. Someone is leaving really unkind post-it notes for the French coder. Not that I ever considered doing that or anything.

ETA: For us, there are two types of builds and fixes: mod (modification) and main (maintenance). The former is actual new things added to the code, like, I don't know, adding an interface or new policy or changing the color scheme. Maintenance is stuff that is already there that broke and needs to be fixed, like suddenly you can't make a page work. Emergency fixes in general are maintenance, something broken that needs fixing, with occasional mods, the legislature did something dramatic.

None of this means they aren't lying and it wasn't deliberate. My department failed an entire build once due to the errors in it.

Actually, the easiest way to find out if it was deliberate is to hunt down whoever did their testing and check the scripts they wrote, or conversely, if amazon does it all automated, the automated testing scripts will also tell you exactly what was being tested. If it was deliberate, there were several scripts specifically created to test this change.

Example:

If I wrote the user script and was running it in a near-field environment.

Step Four: Query for Beauty's Punishment from main page.
Expected Result: Does not display.
Actual Result: Does not display.
(add screenshot here)

Step Five: Query for Beauty's Punishment from Books.
Expected Result: Displays.
Actual Result: Displays.
(add screenshot here)

We're like the evidence trail. Generally, a tester has to know what they are supposed to be testing to test it. If this was live beta'ed earlier this year with just a few authors, it still had to, at some point, go through some kind of formal testing procedure and record the results. And there would be a test written specifically to see if X Story Marked Adult would appear if searched from the main page, and one specifically written to check that X Story Marked Adult was showing sales figures, either human-run or automated.

[identity profile] seperis.livejournal.com 2009-04-14 03:19 am (UTC)(link)
Oh, I agree, that's why I clarified--only if what they say is absolutely true would the delay make sense in terms of identification, patches, and testing.

(Anonymous) 2009-04-14 04:03 am (UTC)(link)
Sorry, but this 'story' still does not explain what happened to Craig Seymour in February nor why Mark Probst got the reply he got: both got the same generic reply regarding being delisted because of the 'adult' content stuff several weeks apart.

So it cannot have been an error propagating only in the past couple of days.

And if it had been propagating since February, then that story is factually wrong, or we are supposed to believe that Amazon knew about this since then and was not able to fix it in WEEKS? In which case, were they hoping no one would notice ever?!

Plus if it was just a flag switching non-adult to adult, why would it target mostly GLBT and other 'controversial' topics in the US? Because it sounds as if a lot of truly adult books got hit as well... No sorry, this still does not hold water.

Let's also remember that it is not so much the details of the supposed technical problem -which we can all agree probably did take Amazon by surprise- with which most of us have an issue, as much as:

(1) several separate Amazon reps replying to authors with a canned 'adult' argument (Craig Seymour got that response from more than one rep and other authors than Mark Probst got that story too). So that points towards a policy decision, not an error

(2) Amazon only speaking up officially and contradicting themselves after (24+ hours) Twitter erupted and coined the #amazonfail tag

(3) the direct relationship between ranking and searchability and recommendations/links

(4) the monolithic nature of Amazon which means that the day they are on the only online bookstore left, they's be free to abuse that power: Google Amazon and you'll find that they are already sadly famous among the cognoscenti for all sorts of related shenanigans. For instance, remember when the comments which happen to mention the fact that the game Spore carried EA's DRM all disappeared from Amazon. This was never answered by Amazon in a way that made any sense.

Bottom line, regardless of the real story, this is not about human errors or about most software built-in obsolescence or about the complexity of huge databases like Amazon's: this is about the fact that we need to support other, smaller bookstores -both physical and online- to ensure that Amazon cannot become the next evil Empire. Which they will, if things continue as they are, because power corrupts...

[identity profile] seperis.livejournal.com 2009-04-14 04:08 am (UTC)(link)
So it cannot have been an error propagating only in the past couple of days.

No, that's what makes it interesting for me as a tester. They did a live beta test of it before a full rollout site-wide. I'm curious as to what their criteria was for teh beta test.

Bottom line, regardless of the real story, this is not about human errors or about most software built-in obsolescence or about the complexity of huge databases like Amazon's: this is about the fact that we need to support other, smaller bookstores -both physical and online- to ensure that Amazon cannot become the next evil Empire. Which they will, if things continue as they are, because power corrupts...

I'm not actually disagreeing with any of that. What this specific post was for was to state that it doesn't matter whether it's deliberate or not, what was done cannot be undone with a single flick of the switch. And especially if they've been doing a live beta since February on some authors. A full rollout of new code would be one thing ot remove, but they have to track back and pick up their beta code and remove that as well. That might take extra time, especially with at least two months of database updates since then.

(Anonymous) 2009-04-14 04:36 am (UTC)(link)
It's really funny but seeing how much detail you go into has just jogged my memory: 2 years ago, I did some scripting for a video website, trying to increase their search efficiency while implementing a filtering system which allowed access to the adult video titles only to registered users, so I am in point of fact even more knowledgeable about this type of issue than I initially mentioned. Funny... And yes, we used an 'adult content' binary flag. No amount of messing up said flag could have created the mess we have witnessed. Anyway.

- About the technical side of things? Still possible, but IMO irrelevant and a probable red herring. Just like the hacker who tried to claim credit earlier today added 'cute' little touches like 'trying to pick up chicks who do heroin' to add verisimilitude, this addition of 'it was the fault of a dirty foreigner' is just crass

- your point about Amazon not necessarily being able to just flick a switch? Oh yes, absolutely and I did not know that this was ever in doubt. I probably missed it in the deluge of Twitter updates...

No, the concerns I have seen so far seem to be that Amazon could simply restore the best-known titles, while the lesser-known books (possibly the majority of 57K titles and probably not selling a lot anyway) will never be restored once the public outrage is gone.

Which is mostly true by now: most are so used to trusting Amazon that they assume that everything is now fine and we can go back to adding to our wishlist.

I've also heard some concenr about the whole thing might have been a way to force or frighten Amazon business partners into using BookSurge, Amazon highly touted new (paying) technology? I have no real idea what this last is about.

*sorry, I seem to be accumulating an unbelievable number typos in each one of my replies*

[identity profile] seperis.livejournal.com 2009-04-14 04:52 am (UTC)(link)
Ohh yes, that kindle thing is what seriously makes me twitch about their explanation. If they did a rollout on the main site, even if kindle is separate, there is no reason it *shouldn't* be implemented to ebooks.

Heh...

(Anonymous) 2009-04-14 05:04 am (UTC)(link)
It's the whole 'boy who cried wolf' phenomenon: once the trust is gone, everything you hear sounds suspicious and you start to want logic and proof.

On a different topic, you wouldn't believe how much I have to restrain myself not to blurt: "Testing? What is this testing you are speaking of?"

Seriously, most corporations I work with begrudge me even 24 hours for testing, and that's assuming they have a test environment. Usually my own coding goes straight into production environments used by dozens or hundreds of users, regardless of how much I protest and how dire my warnings.

I have lately begun to suspect that is why the past few years have reduced me to a nervous wreck... plus I seem to have become a really irritable and impatient person surrounded by morons: seriously, Rodney McKay is a cuddly bear compared to me.

:D

Re: Heh...

[identity profile] seperis.livejournal.com 2009-04-14 05:07 am (UTC)(link)
Luckily, when the state decided to implement going to this program, it was required legislatively to go through testing with state employees as well as our vendor. Which in retrospect is a freaking miracle, because our vendor testing is--well. There aren't words for it.

re BookSurge and Amazon's Buy buttons used as a weapon

(Anonymous) 2009-04-14 08:07 am (UTC)(link)
link from Neil Gaiman regarding Amazon business practices, with a mention of BookSurge as a bonus:

http://www.nytimes.com/2008/06/16/business/media/16amazon.html

Note that this article dates back to last year and Amazon is already using its market share to its advantage against small publishers; if left unchecked, they will become the Walmart of the bookselling business.

And a few useful links as a Public Service Announcement:
http://www.betterworldbooks.com/
http://www.powells.com/
as well as:
http://www.paperbackswap.com/