BOTHERING CONSUMERS: WHEN RECOMMENDATION AGENTS DON’T REALLY MAKE OUR LIFE EASIER

Intelligent systems have been used in different types of websites with the intention of creating personalized messages and understanding consumers’ needs more deeply. They are supposed to facilitate the decision-making process, make internet browsing easier and give users a sense of social feeling and personalization. So far, research in the field has focused attention on the positive aspects of using these systems. Little effort has been made, however, to try to recognize and correct situations in which they do not perform so well. This work is the result of an exploratory research destined to understand the broadness of the concept of failure in personalized environments as well as its antecedents and consequents. Based on the critical incident technique, we collected the opinion of 86 subjects in a multicultural environment and used their responses to elaborate a comprehensive framework of recommendation failure considering the different motivations for Internet use.


INTRODUCTION
The development of online technologies has brought to Internet-based companies a whole new set of possibilities for data collection and personalization.The traceability feature of online browsing has given access to new kinds of consumer data based on real behavior, which are being used to improve purchasing and consumption experiences on the Internet.A great amount of this information is computed by intelligent systems for personalization purposes.Nowadays, most companies operating in the e-commerce ecosystem are using recommendation agents as one of the strategic technological investments to increase sales and user satisfaction (Yang et al., 2017).
As the intelligent systems responsible for gathering, processing and analyzing consumer data are increasing in accuracy and predictive power, they gain new adopters for very different purposes.On one hand, streaming channels use algorithms to group consumers in different categories and suggesting to them films and music of their interest.The same rationale is used by social networks and news websites to tailor and recommend content according to each consumer observed characteristics.On the other hand, commercial websites try to increase sales performance either by making the decision process easier, playing the role of decision aids, or by making personalized offers according to the inferred consumers preferences and needs.
It is, therefore, paradoxical the fact that consumers who browse products online often leave the website without buying and do not return (Lambrecht and Tucker, 2013).Even though most online recommender systems derive recommendations from real behavior, what would increase the objectivity of recommendations, they do not have a better performance when compared to other subjective customer feedback mechanisms such as reviews and ratings (Garfinkel et al., 2006).Research so far has sought for new methods to improve the accuracy of these systems, especially trying to make their recommendations more precise.In the behavioral field, research has concentrated on looking for the benefits of the use of recommendations in the decision making process.
Up to date, mainstream research related to the use of recommendations in online purchasing are primarily concerned with their role in saving decision effort and increasing decision accuracy (i.e.Häubl and Trifts, 2000;Gretzel and Fesenmaier, 2006;Häubl and Murray, 2006;Xiao and Benbasat, 2014).These studies, however, depart from the assumption that recommendations are always welcome and easily accepted (Bechwati and Xia, 2003;Shani and Gunawardana, 2011).Although such assumptions seem to work well within laboratory controlled environments, when it comes to the application of recommendation agents in real purchase situations their influence is not so straightforward, especially if one looks at the low rates of return leveraged by them in field experiments (i.e.Goel and Goldstein, 2013;Lambrecht and Tucker, 2013).
One could easily argue that it might be a problem of system accuracy.In other words, consumers may not be purchasing the recommended products because they are being presented recommendations of products that do not match their expectations properly.Netflix launched, in 2009, a one million dollars prize to anyone who could increase the predictive accuracy of their recommendation algorithm (Koren et al., 2009).Even then, Netflix never bothered to actually implement the winning algorithm, because, according to them, the additional accuracy gains did not justify the engineering effort needed to bring them into a production environment (Holiday, 2016).These findings suggest that some other factors related to the consumer behavior need a better examination in order to shed light over this apparent contradiction.One way of doing that is by changing the perspective of analysis to the consumers' point of view.
As it will be shown in the literature review chapter, consumers' responses to recommendations have already been thoroughly studied for the cases when the recommendation system succeeds in presenting good suggestions.On the other hand, little research has yet investigated the cases where recommendations fail to address consumers' preferences adequately, and the responses they give to such failure.An exception can be made to the work of Fitzsimons and Lehmann (2004), who investigated reactance to recommendations, and to Lambrecht and Tucker's work (2013) on retargeting.Their research, however, was concerned with some very specific aspects of consumer's browsing behavior and did not consider a broader framework.
A more thorough attempt to look at failures in online shopping is made by Tan et al. (2016).Their work established important parameters to the analysis of failure in electronic services.Nevertheless, Tan et al. 's (2016) proposition is concerned specifically with purchasing behavior.Whereas the present research relates theoretically with Tan et al. 's (2016), there is a very important distinction regarding research scope.In this paper we focus on Internet consumption and the way it is affected by personalized recommendations.
Having that in mind, the present research has the intention to contribute with theory and practice in information systems management research by investigating how consumers respond to such messages.The results of this study could be used to understand how to identify a recommendation failure and what can be made to alleviate the possible negative consequences such an event can cause in the online purchasing behavior.

RECOMMENDATION AGENTS AS TOOLS FOR PERSONALIZATION
The methods used to generate personalized content on the Internet are part of a relatively recent research stream grouped around the term recommendation agents.As persona-BOTHERING CONSUMERS: WHEN RECOMMENDATION AGENTS DON'T lization systems, recommendation agents can help consumers to make purchase decisions at a certain point in time by giving them advices tailored specifically to their needs (Shani and Gunawardana, 2011).Therefore, from a consumer's perspective, recommendation agents have the potential to reduce decision-making effort and increase decision accuracy (Dellaert and Haubl, 2012;Shani and Gunawardana, 2011;Xiao andBenbasat, 2014, 2007).
They also significantly affect other decision processes and outcomes, such as perceived cognitive effort (Aljukadar et al., 2012;Wang, 2005;Häubl and Trifts, 2000), confidence in the purchase decision (Pu et al., 2011;Cosley et al., 2003), website trust (Tan et al., 2012;Komiak et al., 2005) and different types of satisfaction: satisfaction with the system (Knijnenburg and Willemsen, 2009;Zins and Bauernfeind, 2005), satisfaction with the search process (Punj and Moore, 2007) and satisfaction with the decision (Hostler et al., 2005;Pedersen, 2000;Vijayasarathy and Jones, 2001).In this sense, recommendation agents can influence not only the way users make decisions while searching for product alternatives, but also which, among all available options, they will consider.
Given the potential they have to reduce the amount of effort demanded to make a decision and to increase decision accuracy, the use of recommendation agents can also impact e-vendors' strategies and revenues (Hinz and Eckert, 2010;Oestreicher-Singer and Sundararajan, 2010).Nevertheless the complexity of the variables that interfere on the way between the recommendation process and the users responses calls for new research capable of unveil some of these issues.The next subchapters are dedicated to review previous investigations and to summarize what has been discovered so far.

MEASURES OF RECOMMENDATION AGENTS PERFORMANCE
Before establishing the measures used to determine an agent's performance, it is important to notice that recommendation agents are intended to equate the interests of both users and merchants (Häubl and Murray, 2006).Therefore, their performance is bounded by their ability to address both parties' interests in the best balanced way.From a merchant's perspective, there is a clear interest of increasing sales revenues, but in a sustainable manner, which is attained by having the customer satisfied and loyal.Chen et al. (2004) and Oestreicher-Singer and Sundararajan (2010) studies linked the impact of recommendations with sales performance, and they confirmed that sales were impacted by the strength of the recommendations and also by the number of reviews that the product receives.Zhou et al. (2011) also show that recommendations account for about 30% of overall video views on Youtube.Similar results where obtained by Hinz and Eckert (2010) and Zhou et al. (2011) studying video streaming.
For consumers, on the other side, a recommendation can either represent a vaguely annoying invasion of privacy or a big help in bringing order to a sea of choices (Flynn, 2006).In this sense, recommendation acceptance could be considered the ultimate measure of a recommendation performance for both sides.Accepting a recommendation means that consumers analyzed the alternative proposed by the Recommendation Agent and considered it as the best option among all available.This will happen in the case that the recommendation presented is in accordance with the personal preferences of an individual (Komiak and Benbasat, 2006) or if she believes the recommendation agent is operating in her best interests (Wang and Benbasat, 2009;Häubl and Murray, 2003).Usually, online acceptance is a measure of accuracy and it can be calculated by different methods such as: (i) selection of non-dominated alternatives, (ii) utility values, (iii) selection of target choice and (iv) selection of target choice among k-best items (Zhang and Pu, 2006).
Other important measure of RA performance from a consumer perspective is cognitive effort.Wang (2005) argues that there is an important role played by consumer's cognitive effort in their evaluations and acceptance of the recommendation agents.It is also argued that consumers tend to focus more in reducing effort than in increasing decision quality because feedback on effort expenditure can be accessed immediately while feedback on accuracy is subject to both delay and ambiguity (Wang, 2005).In line with that, thus, if two strategies will produce the same level of accuracy, the one which is expected to require less effort will be preferred (Todd and Benbasat, 1994).
Cognitive effort is frequently objectively measured in two ways: (i) consideration set size and (ii) decision time (Wang, 2005).A consideration set is the amount of options a consumer considers seriously before making a decision (Häubl and Trifts, 2000).Consequently, too many options included in a consideration set will demand higher cognitive effort than smaller sets.Recommendation Agents can actually decrease set size when consumers find them trustworthy (Häubl and Murray, 2003;Häubl and Trifts, 2000).Other measure for cognitive effort is decision time, which can be computed directly by the time consumers spend in making a decision.Some authors have also argued for the use of indirect measures for cognitive effort, such as perceived cognitive effort (Kurzban et al., 2013;Kleijnen et al., 2007;Cooper-Martin, 1994).They argue user's perception of cognitive effort can be more determinant to intention and future behavior because it deals with the impressions primed in consumer's memory, especially because a consumer will rarely monitor the exact time spent to make a decision.There is also evidence in the literature supporting a link between subjective evaluations and adoption intention and adoption behavior (Gefen, 2003;Venkatesh, 2000).It is also important to acknowledge that for some specific products (that can vary from user to user) the amount of effort spent to make a decision may not be important as important as accuracy.That would be the case, for example, for product categories in which the consumer has a previous domain-knowledge or for those which the consumer may even enjoy to decide (as movie options, restaurants or vacation destinations).
Intention to use has been considered another important measure to RA success.Some studies have demonstrated that effort and quality are two important variables influencing users' choice behavior and their intentions to use decision aids (e.g.Payne, 1982;Todd and Benbasat, 1992).Dabholkar and Bagozzi (2002) propose a model to measure intention to use an online system based on the reported probability of using it in the future.Wang and Benbasat (2009) also developed a similar scale, adapted from Davis (1989), to be used specifically with decision aids.
At last, satisfaction has been also considered to be an important driver of future behavior and an important measure of an RA success.Research has considered three types of satisfaction as dependent measures resultant of RA use: satisfaction with the system (i.e.Knijnenburg and Willemsen, 2009;Zins and Bauernfeind, 2005), satisfaction with the search process (i.e.Punj and Moore, 2007) and satisfaction with the decision (i.e.Hostler et al., 2005;Pedersen, 2000;Vijayasarathy and Jones, 2001).
On the other side, little effort has been made to measure undesirable responses to recommendations.A notable exception can be found in the work of Fitzsimons and Lehmann (2004).According to them, although much of the literature suggests that opinions and recommendations are desirable in decision-making, this only happens when the recommendation is consistent with individual choice preferences.Consequently, when recommendations contradict the consumer's initial impressions of choice options, there will be an increased level of difficulty in making the decision and, at the same time, an individual tendency to choose the alternative rejected by the recommender (Fitzsimons and Lehmann, 2004).
This kind of response can happen when the individual feels that, rather than a mechanism for facilitating decisionmaking, the recommendation agent is purposely limiting the consideration set, restricting her freedom of choice.According Fitzsimons and Lehmann (2004), based on the theory developed by Brehm in 1960, threats to freedom can motivate an individual to adopt behaviors that seek to regain the freedom once threatened or lost, even if these behaviors are not congruent with their immediate needs.The motivation for the recovery of this freedom is called psychological reactance.Fitzsimons and Lehmann (2004) believe that reactant behavior can be stimulated when the recommendations are unwanted.They found that when the recommendation is contrary to personal choice preferences, some undesired patterns emerge.As decision-making difficulty increases, given the conflicting information, choice and confidence in the non-recommended alternative significantly also increase, giving room for a reactant behavior.Lee and Lee (2009) reached convergent conclusions conducting an experimental study at an e-commerce store.The empirical results of their work have shown that user expectations for personalized service induces the perception of usefulness, because choosing among too many alternatives may be a nuisance to the decision maker.Wang and Benbasat (2009) investigated the impact of perceived restrictiveness on user behavior and found that it significantly affects the perceived cognitive effort, advice quality and consumer's intentions to use online decision aids.They also found that decision strategy plays a significant role in perceived restrictiveness, in that "the additive-compensatory aid is perceived to be less restrictive, of higher quality, and less effortful than the elimination aid, whereas the hybrid aid is not perceived to be any different from the additive-compensatory aid" (Wang and Benbasat, 2009, p. 293).Table 1 presents the dependent measures exposed previously as possible responses to recommendations.
As it is the intention of the present work, we are now going to focus on the specific cases in which the recommendation process does not generate the expected outcomes.But first, we would like to propose a theoretical definition for these situations, considering the specificities of recommendation agents use in the e-commerce stores.For this, we are going to consider every case in which such unexpected outcome is generated as recommendation failure.
Given the role of recommendation agents as a distinguished part of the whole online purchasing process, current definitions used for service failure (i.e.Adams, 1965;Walster et al., 1973;Bitner et al., 1990;Smith et al., 1999) or e-commerce failure (i.e.Tan et al., 2016) may be inappropriate to define recommendation failure precisely.Together with this deficiency comes another related problem, which is to understand how this failure in providing good recommendations is perceived by the user.Complementarily, there is also a need to better understand what are the consequences of a failure occurrence in the subsequent acts of a consumer purchasing online.The following section will discuss about this still underinvestigated topic to propose a theoretical definition of recommendation failure.

THE PROBLEM OF FAILURE
A common definition of failure in Philosophy characterizes it as something opposed to success (Desmond, 1988).Failure, in that case, would be a difficulty to achieve a goal thoroughly.It is interesting to note, however, that given the impossibility to have a complete success in every endeavor a person makes, humans are usually prone to accept minor personal failures in order to reach bigger goals in their lives (Feltham, 2012).The same occurs to people when they are relating to others.They usually relegate some possible failures from a third party depending on the gravity of these failures,

Variable
Ways of measuring the level of relationship with this other party and the level of intentionality they address to it.
In the services context, for example, it is well known that occasional failures are not rare and they are even expected sometimes.Even when they are expected, however, service failures can have permanent negative effects to companies, if not treated appropriately, because of the strong emotional responses they leverage.According to the most accepted definition, service failure can be comprehended as an experience of loss incurred by the client during an encounter in which the service company should provide a gain or benefit (Adams, 1965;Walster et al., 1973).Essentially, any service failure represents a negative experience to the consumer at the moment of its occurrence and, therefore, has implications to the creation and maintenance of the relationship status (Bitner et al., 1990;Tax et al., 1998).
Even though e-commerce can also be classified as a type of service provision, which allows us to use some of the theoretical basis from services and marketing research, e-commerce service failure needs a distinct view.This happens especially because of the easiness online consumers have to change from one website to another and also because of the lack of physical contact during the purchase process (Xu et al., 2013).For Cenfetelli et al. (2008) service failures may invoke enduring and temperamental responses from the consumer because they are more likely to arouse negative emotions.They believe that "system success, in the context of e-commerce transactions, is rooted in the capacity of self-service applications to deliver a rewarding customer service experience on a consistent and recurring basis" (Cenfetelli et al., 2008).Tan et al. (2016) recently focused their attention to service failure in e-commerce, since this is a very sensitive determinant of sales success.They define e-commerce service failure as a "negative event that occurs whenever the e-commerce website is incapable of offering the necessary technological capabilities essential for a consumer to accomplish his/her transactional activities and/or objectives" (Tan et al., 2016, p. 3).Tan et al. (2016) reported three groups of e-commerce failures: (i) informational failure; (ii) functional failure; and (iii) system failure.The authors argue that information failure "is a major deficiency of e-commerce websites and that it occurs whenever information provided on an e-commerce website hinders consumers in accomplishing their transactional activities and/or objectives" (Tan et al., 2016, p. 6).In this sense, information failures can come either from inaccuracy, incompleteness, irrelevancy or untimeliness of the information.
Complementarily, Tan et al. (2016) define a second type as functional failures, those considered "to have occurred whenever functionalities provided on an e-commerce website are incapable of supporting consumers in accomplishing their transactional activities and/or objectives".The five forms of functional failures identified by them are needs for recognition failure, alternatives identification failure, alternatives evaluation failure, acquisition failure and post-purchase failure.
Finally, Tan et al. (2016) propose a third type of ecommerce service failure, the system failure.They characterize it as occurring "whenever service content (i.e.information and functionalities) offered by an e-commerce website is not delivered in a conductive manner that facilitates consumers in accomplishing their transactional activities and/or objectives" (Tan et al., 2016, p. 8).The subtypes of system failures are inaccessibility, non-adaptability, non-navigability, delay and insecurity.
From what could be found in the literature review, it is possible to infer that, although the definitions used to classify service failure and e-commerce failure can be helpful to understand the concept of failure in online environments, they are not precise enough to describe what could be considered a recommendation failure.Our argumentation is based on the accessory role played by recommendation agents in the online purchasing process.By accessory role, we mean that recommendation agents are usually not the focus of the purchasing itself, but they act as decision tools for the consumer.In this sense, recommendation failure could assume distinguished forms and bring distinguished consequents.
Because they are not the focus of the purchasing process itself, when Recommendation Agents fail to meet users needs, they do not necessarily leverage feelings of lost the same way it would occur with service failure.Analogously, recommendation failure would not be hampering the consumer to accomplish his/her transactional activities and/or objectives the same way e-commerce service failure is seen to be doing in Tan et al.'s definition (2016).From these considerations, we advocate for the need of a distinguished definition for recommendation failure.
Considering the aforementioned accessory role of recommendation agents as decision aids, we propose that a recommendation failure will happen in online environments whenever users notice a discrepancy of some personalized message with their own momentary perceived interests.The word "momentary" has a central role in this definition, since we consider that consumers do not always have predefined specified preferences.That being true, the success or failure of a personalized message will be intrinsically dependent on the context and the current state of the user.That is in accordance with the constructive processing perspective, which states that consumers tend to construct their preferences on the spot when they are prompted either to express an evaluative judgment or to make a decision (Bettman et al., 1998).
We also used the conceptual approach proposed by Nickerson et al. (2013) to derive a classification system based on three possible sources of a recommendation failure: (i) content; (ii) format; and (iii) context.Content refers to the cases where the system fails to reach an accurate prediction of consumer's preferences and, therefore, recommends something that is not in accordance with the consumer's perceived needs.It could be either an inaccurate prediction, leading the system to recommend a product that is not really recognized as the best option by the consumer.Failures in the presentation format will occur whenever the system does not provide appropriate or useful information to assist consumers in recognizing the recommended option as the most appropriate for their needs.Failures based in the context occur when the recommendation is not welcome because the moment or place are not appropriate.Following in the next subchapter, we will further develop this concept based on primary data collected from the research method detailed on the following chapter.

RESEARCH METHOD
Given the exploratory nature of this research, we decided to conduct a qualitative field survey with data collection strategies adapted from the critical incident technique.This technique has been used in social sciences, more specifically in Psychology, since the 1940's to understand individual behavior in the occurrence of critical incident events (Flanagan, 1954).More recently, it has been applied in research related to many fields of management (i.e.Tan et al., 2016;Potts et al., 2017;Salo and Frank, 2017;Kinnunen et al., 2017).
According to Flanagan (1954), the critical incident technique "consists of a set of procedures for collecting direct observations of human behavior in such a way as to facilitate their potential usefulness in solving practical problems and developing broad psychological principles".The critical incident technique outlines procedures for collecting observed incidents having special significance and meeting systematically defined criteria (Serrat, 2017).
We consider the critical incident in this case as any event in which the consumer has consciously recognized a recommendation agent's failure in providing good advice.We chose to use a self-report questionnaire because it would give the respondents the freedom to think about their answers without the pressure of saying anything right away.We also chose to collect the data online because we considered it would prompt the participants to remember the recommendation incidents more easily.As recommended by Serrat (2017) and Flanagan (1954), participants were asked to provide a detailed description of the incident, indicating the website they were accessing and the reason why they believed the recommendation was not appropriate.We also asked them to tell about their reactions (i.e.Hettlage and Steinlin, 2006) when they realized the recommendation was a failure.
Data was collected during the months of January and February of 2017 and the respondents were part of a multicultural group of university college residents at a Canadian university.Subjects were invited to participate via e-mail and answer to the online questionnaire.There was a filtering question asking to report if the respondent could recall any incident related to recommendation failure.After data collection, all questionnaires that passed through the filtering question were reviewed, and those that did not describe the incident with the required precision were eliminated.
For data analysis, we followed the procedures proposed by Weber (1990) and Neuendorf (2016).All the answers to open questions were read individually by both authors and, after a second reading, rules of classification emerged from the data.For this classification, three dimensions were established as topics for classification: (i) type of incident reported, (ii) response to the incident and (iii) website category.These topics were defined based on the research questionnaire.
We used them as a way to create categories in which the answers were classified.For the three topics, each answer was considered individually by each researcher and reduced to an expression that could encompass a great number of similar incidents.The categories that emerged from both researchers were, then, compared to each other and a consensual agreement was reached for each topic, as it is shown in Figure 1.
A third researcher who was not involved with the data collection process helped to confirm the categories by classifying the answers once more, according to the description of the categories.In this final verification, the proposed system of classification confirmed to be accurate.After that, we compared the obtained results with the theoretical background, what permitted us to elaborate a comprehensive framework destined to address failure in recommendation.This framework will be presented in the discussion section of this paper.

RESULTS
We collected data from a multicultural group of people from different countries based on a convenience sample.The invitations to participate in the survey were sent both to e-mail lists and social media contacts.We received 206 responses, from which only 105 passed through the filtering question.In-adequate answers were removed either because they presented some missing data or because the description of the incident was not precise enough to be assumed as accurate (Flanagan, 1954).At the end, a total of 86 questionnaires were considered valid.Table 2 summarizes the main findings of the survey.
After a first reading, the descriptions of the incidents were grouped and coded according to the categories that emerged from the reports.We could identify five main types of incidents mentioned by the respondents.The most cited incidents (40.0% of the total) referred to inaccurate recommendations, what suggests that improving system's accuracy is still a major determinant of recommendation agents' performance.This corroborates with propositions from Bang and Wojdynski (2016), Aljukhadar et al. (2012) and Chen et al. (2004), for whom system accuracy continues to be the main indicator to be improved in recommendation research.
In cases of inaccuracy, subjects reported situations in which the recommended product, service or content was not in accordance to their own preferences.A broad range of occurrences reporting recommendation inaccuracy was brought by the respondents, which were related either to products, services or information content (for social media).Following Tan et al. (2016), inaccurate recommendations where classified in our theoretical framework as content failures, because they related to situations in which the failure occurred in the core function of the recommendation agent.
Another large group of people (27.06%) reported incidents where they received personalized advertising for recently searched products in each website they went.A great amount among them complained specifically of seeing recommendations to buy products they had actually already bought.These cases were considered as context failure, because they did not questionnaire.There was a filtering question asking to report if the respondent could recall any incident related to recommendation failure.After data collection, all questionnaires that passed through the filtering question were reviewed, and those that did not describe the incident with the required precision were eliminated.
For data analysis, we followed the procedures proposed by Weber (1990) and Neuendorf (2016).All the answers to open questions were read individually by both authors and, after a second reading, rules of classification emerged from the data.For this classification, three dimensions were established as topics for classification: (i) type of incident reported, (ii) response to the incident and (iii) website category.These topics were defined based on the research questionnaire.
We used them as a way to create categories in which the answers were classified.For the three topics, each answer was considered individually by each researcher and reduced to an expression that could encompass a great number of similar incidents.The categories that emerged from both researchers were, then, compared to each other and a consensual agreement was reached for each topic, as it is shown in Chart 1.

Chart 1. Categories emerging from content analysis.
Source: The authors.
A third researcher who was not involved with the data collection process helped to confirm the categories by classifying the answers once more, according to the description of  actually refer to a wrong content, instead they were related to an inconvenience perceived by the consumer.There were also two cases in which the consumers reported an incorrect way of presenting the recommendation by giving a wrong impression about the service it was suggesting and another by labeling the recommendation in a inadequate manner.These were classified as format failure.Inappropriate recommendations were reported by 8.2% of the participants.We classified the incidents reported as being cases of inappropriateness whenever the description included situations with delicate products or services, but did not explicitly complain of inaccuracy.In this category, we included mainly the cases where respondents received recommendations of sexual content and other sensitive products, such as lose weight messages and medicine for venereal diseases.These cases were also classified as context failure.These failures demonstrated to be the most sensitive in terms of emotional response because they dealt with personal perceptions and self image.In these cases, what seemed to be determinant to the perception of failure was not accuracy, but inconvenience.This inappropriateness could be the result of lack of context, what means that in such cases, context-awareness could improve recommendation effectiveness by finding the right moment and the right way of presenting recommendations, as proposed by Salo and Frank (2017).
One other surprising result was the amount of manifestations from the respondents relating cases of problems with reviews from other customers on the websites where they were purchasing.This was not initially in the scope of this paper, but we decided to include these cases in our framework for two reasons.One of the reasons is the frequency with which these problems were reported, what indicates it is an important problem, but the main reason is because we believe that some of the rationale already used by automated recommendation agents could be applied in the customers' reviews section to improve the matching between reviewers and consumers.We decided to classify problems that reported review failure as format failure because they are not related to the content itself, but to the way this content is organized and presented to the user.
At the end, we also categorized the reactions reported by the participants in one of the following categories: (i) participants who ignored the recommendation; (ii) participants who had an emotional reaction; and (iii) retaliation.Emotional reactions were, then, subdivided into negative feelings and evasive feelings.As negative feelings we considered reports of being upset, insulted, offended and irritated.Evasive feelings were considered frustration, disappointment and suspiciousness.As retaliation, participants cited either reactant behaviors, such as buying an option diverse from the recommended product, and also unsubscribing or reporting the incident to the website owner.
Based on these results, we propose a comprehensive framework for analyzing these incidents, looking at the an- tecedents and the consequents of recommendation failure.Figure 2 outlines a visual representation of the proposed model.One central issue to understanding this framework is the perception of failure.
We also performed a test to understand if the responses could be attached to the importance attributed to the product and to the type of incident reported.The results suggest that both importance attributed to the product (χ 2 = 12.67, 6 d.f., p < 0.05) and type of failure (χ 2 = 27.96,6 d.f., p < 0.001) could be determinant for the user response to failures.
As it would be expected, it is possible to suppose that when the product is considered unimportant to the user, the usual response tends to be just to ignore the recommendation.At the same time, for the cases in which the recommendation was considered inappropriate, subjects reported to have stronger reactions, either with an emotional response or with a real action, such as to complain about the incident to the website owner.For the cases where the product had a slight importance, the most reported reaction was an emotional response and when the product was considered very important, subjects reported to have had emotional responses or real action, depending on the gravity of the failure.
The responses to the type of failure followed a similar pattern.When the recommendation was considered to be inaccurate or inappropriate, a great part of the respondents demonstrated to have simply ignored the message.As for the stalking ads and the peer reviews, these seemed to be much more prone to leverage emotional responses.Also, respondents showed to be more prone to take action when the source of failure was the reviews of other consumers on the website.

CONCLUSIONS
Personalization emerged as a powerful tool for facilitating consumer's decision making process and also to increase website performance.It also has demonstrated to be an important and useful instrument for helping users to deal with information overload.However, even though theoretically they have the potential for all these things, in practice their efficiency seems to be still distant from expectations.This initial and exploratory work sought to raise some of the reasons for such discrepancy looking to the users' point of view.It was also our intention to propose an initial framework that could be used both by practitioners and by researchers in their efforts to improve such systems.
Although, in accordance with previous research, accuracy demonstrated to be the most important determinant of failure (i.e.Häubl and Trifts, 2000;Gretzel and Fesenmaier, 2006;Häubl and Murray, 2006), results suggest that they should not be considered the only measure for deciding whether to present a recommendation or not.Instead, a new measure of appropriateness that takes into account not only accuracy, but Source: The author.

Chart 2. Comprehensive framework for analyzing recommendation failure.
Source: The author.
We also performed a test to understand if the responses could be attached to the FÁBIO VERRUCK  WALTER MEUCCI NIQUE the pertinence and product sensitivity could be interesting to increase recommendation acceptance and user's perceptions of a website and its utilities.Additionally, although the majority of the respondents reported to just have ignored the recommendation when the failure occurred, what seems to be harmless to the website owner, it may be a problem in the long term, when after repeated incidents, consumers simply start to ignore every offer, even those that are in accordance with their preferences.Research data suggest that the as the users perceive the failure incidents are more recurrent, they are more prone to ignore personalized recommendations.
We hypothesize, in this case, that a complex process happen that leads to an unwanted outcome.When they are subjected to a personalized offer in an unexpected moment, consumers engage in a decisional process by which they are prompted to analyze the offer and decide what to do with it.Even though this process may be almost automatic, when executed too many times, they will incur in an additional cognitive load for the consumer.This might be suggesting that after a few failure events, consumers create new heuristics to cope with such unwanted messages by simply ignoring them whenever they are presented.This is in line with the idea of Germanakos and Belk (2016, p. 34) who defend that "when the rate and intensity of inflow of such information increases, and exceeds the decay rate or shifting "useless" information, its effectiveness and efficiency is dropped, causing confusion".
We highlighted three possible categories of failure which can be a consequence of either problems with the content, the format or the context in which recommendations are presented.These failure events only trigger behavioral responses when they are actually noticed by the consumer.We believe that, when it happens, consumers will engage in an elaboration process that will alter their browsing patterns until they find the proper response to that failure.This is in line with the ideas presented in the previous paragraph and with the mentioned ideas of Germanakos and Belk (2016) that any stimulus that is coming from the individual's surrounding environment and detected by the human senses is briefly available in sensory memory.According to them, this temporary retention of information as they enter the brain is also called sensory buffer, because it concerns information detected by the senses and not yet processed further in the human brain for processing and interpretation.This can have an important impact on attention to recommendations in the long-term, as we mentioned above.
Other important finding in this study is that in their responses, some subjects reported to have searched for a product or to have looked for a specific information just out of curiosity, but it did not mean that they were willing to purchase anything.It also happened to subjects reporting incidents with social media recommendations, to have clicked on some specific link and after that having a stream of recommendations for similar contents without it being of their real interest.Montgomery and Smith (2009), for example, argue that there is a need to also understand that clickstream is underutilized and it is likely to take years before its potential is fully leveraged.This finding suggests that clickstream alone may not be precise enough to generate accurate recommendations.Complimentarily, Koene et al. (2016) call for a new approach to this problem, considering both clickstream and past behavior, but also contextual factors in order to better predict what and when to recommend.
This could be an important insight, since if it is possible to identify changes in the browsing behavior when a failure event occurs, than it may be possible to use this change as an input to the recommendation agent to correct such incident.A similar approach has been proposed by Lu et al. (2016) to generate recommendations to garments.In their work, Lu et al. (2016) use facial expressions and eye tracking to identify consumers reactions to recommendations made in-store, inferring likes and dislikes through facial recognition techniques and the parts of the garments that were more valuable to each consumer through eye-focus.
The path from the perception of failure to the response also seems to suffer the influence of other interfering variables.In our research, we found evidence to suggest a moderated mediation between user's perception of failure and their actual response.Drawing on Rodger and Thorson's (2000) idea that the consumers' Internet motives influence the level of cognitive effort devoted to the task, we argue that the level of cognitive load involved in a certain task will influence the way consumers deal with recommendation failure.More specifically, we rely on Bang and Wojdynski (2016), who have recently found that task cognitive demand moderates the effects of personalization on attention, to defend that information-seekers may be more prone to recognize failure in recommendations than entertainment seekers.Additionally, we believe that the importance attributed to the item being recommended will be an important moderator of this relation.
The main contributions of this article are three folded.In one hand, it is the first attempt to define recommendation failures in a broad sense.Although an initial conceptualization for e-commerce service failure has already been made by Tan et al. (2016), results show that a precise definition of failure in online recommendations must follow a different rationale.It also helps to define clearly what are the antecedents and consequents of recommendation failure from a user's perspective.Finally, it provides a conceptual framework that can be used as a starting point for future research looking to find more specific relations.As such, further theorizing or more rigorous experimental designs are needed to investigate the proposed relationships.

Figure 1 .
Figure 1.Categories emerging from content analysis.

Figure 2 .
Figure 2. Comprehensive framework for analyzing recommendation failure.

Table 1 .
Measures of responses to recommendations reported in previous studies.
Source: The author.
Source: Research findings.