Model 1 uses the coding scheme of Aslani, Ramirez-Martin, Brett, Yao, Semnani-Azada, Zhang, Tinsley, Weingart, and Adair (2014), “Measuring negotiation strategy and predicting outcomes: Self-reports, behavioral codes, and linguistic codes,” presented at the annual conference of the International Association for Conflict Management, Leiden, The Netherlands.1 Click here for full paper


As the authors describe in the 2014 paper, “we developed a 14-item code based on prior negotiation coding schemes (e.g. Adair & Brett, 2005; Gunia et al., 2011; Weingart et al., 1990)2 to measure participants’ use of tactics. Major categories in the code were information, offers, substantiation, negative and positive reactions, and a miscellaneous category.” The simulation used in their study was The Sweet Shop (newnegotiationexercises.com). The authors shared with us 75 transcripts they had collected and coded using human coders.


Each of the codes is shown below except for “Response Embarrassed” which appeared only once across all transcripts, making it hard to teach that code to the model. For each of the remaining 13 codes, we provide a definition of the code, a short explanation, and sample sentences from the transcripts. The sample sentences let you see how these scholars operationalized their codes, which is what Model 1 learned from and tries to reproduce when coding your transcripts. As with any coding scheme, different scholars might operationalize concepts slightly differently. You should decide if this coding scheme will be useful to you by reviewing how the authors used it.


When reporting your results from this model, please cite this paper:


Friedman, R., Brett, J., Cho, J., Zhan, X., et al, 2024, Coding negotiations with AI: Instructions and validation for coding model 1, https://www.ainegotiationlab- vanderbilt.com/static/assets/VandAI_Neg_Lab_Model1_Paper.pdf.

Below we have listed each of the codes used in this coding scheme (If you prefer an excel file format to view this material, download this file.), along with a brief definition and explanation, followed by example sentences from the coded transcripts. Each of the 13 codes is shown below. The purpose of providing these sample sentences is to let you see how these scholars had their coders operationalize the codes. This way of thinking is what Model 1 has learned and is trained to reproduce.

NOTE: Bold border surrounding several sentences indicates those sentences are an ongoing conversation

Code Number Code Name
1

Description: Making a offer regarding one issue in the negotiation, such as "I'll give you Staff option 3." This code includes stating a preference, arguing for a position, expressing a want, making a suggestion to do something, and giving a concession.

SINGLE ISSUE OFFER - STATING YOUR OWN OFFER OR PREFERENCES
  • I feel like the most sensible thing would be for us all to get trained together.
  • Which is why I think we could maybe hire individually and train them individually.
  • Okay, so as far as temperature, I say we meet in the middle and do 73 degrees Fahrenheit.
  • Like in my case, I would prefer Temperature 1 because it’s warm and people want more ice cream when it’s hot in the store.
  • I would prefer Maintenance 1.
  • I would be willing to split the difference, up to a price of 750.
  • I would be willing to do it as Staff 3 or 5.
  • I would rather pay people who do stuff for me, and like you pay people who do stuff for you. Does that make sense?
  • I mean, I would probably rather have Staff 4 to do specific training, but I would do Staff 3. That would be okay.
  • I’d be willing to do 4 or 5 or.
  • To be honest, I would be fine with Design 6.
  • I say let’s hire my brother.
  • Yeah, you can pay me $6,000.
  • I would prefer to hire and train my staff individually.
  • But what I wanted was that if you could hire them as a group and distribute accordingly.
  • But I’ll take you into consideration and I will lower my temperature for your ice cream shop and I’ll go up to 73 degrees Fahrenheit, or actually 71.
  • I like deluxe cleaning, so I would prefer 4, but if you have problems with, if you want lesser levels of hygiene, then I might be.
SINGLE ISSUE OFFER - DEMANDING/PUSHING YOUR OFFER
  • Look at 5. Why don’t we go for 5? I pay from my profits, you pay from your profits.
  • Why do you want to hire them as a group? Let’s hire them individually.
  • That’s why I’m trying to say we should go for 5.
  • You cannot just go for 5?
  • That’s why I was saying 6. (Offer is in the prior sentence)
  • Look, give me a separate website.
  • I still believe that I should design my own and you should design your own.
  • Okay, we’re cutting costs over there. You should pitch in some money here in terms of the cleaning. It’s simple.
2

Description: Making an offer that includes at least two different issues, such as "I'll give you staff 1 and Design 4" or suggesting a tradeoff between issues.

MULTI ISSUE OFFER - OFFER INCLUDES SEVERAL ISSUES
  • So, like, if we did Staff 6 and Design 6.
  • I would concede on design if you’d be willing to concede on staff.
  • How about I compromise on this and you compromise on another issue?
3

Description: Accepting an offer on one or more issues, such as "OK, staff 1"

ACCEPT AN OFFER
  • Sure, 73 would be fine.
  • Okay. I can, I think that’s a good way to work that out.
  • We can just go with your brother designing the website.
  • Yeah, that sounds good, yeah.
  • All right, Temperature 1, that’s fine.
  • Okay. All right. And I’m, yeah, I’m okay with that if you’re okay.
  • Okay.
  • Yes, I would be, I would be willing to do that. Okay.
  • I still think that I should design my own and you should design your own, okay? Cool. Okay.
  • 75 degrees. (based on prior sentence)
  • Okay, fine.
  • Yeah.
4

Description: Clear rejection of an offer, such as "no way staff 1 and design 4"

REJECT AN OFFER
  • I’m not going to go for 5, I’m sorry.
  • No, I’m not going to go for 4. It makes no sense. I’m sorry, but you haven’t.
  • I don’t want your brother to design my website.
5

Description: Asking a question about the other side's preferences or statements.

QUESTION - ASKING CLARRIFICATION
  • So you’re the ice cream shop?
  • What do you mean? Like which one’s the most important to me?
  • All right?
  • Excuse me. Are we on the same floor?
  • It’s the same level?
  • Shared space, not on different floors?
  • Two floors?
  • What’s upstairs, though?
QUESTION - ASKING PREFERENCES
  • Okay. All right. And what would you prefer with maintenance?
  • With the delivery?
  • Cold delivery?
  • Seventy-five hundred?
  • So of the five things, what’s most important to you?
  • Sure. So what are your expectations for staff?
  • What’s the order for you?
  • So for staff you’d like Option 6?
  • And if you don’t mind me asking, which category of negotiations is your, what you get the most utility out of?
  • Than would, like, making ice cream and stuff, but, I mean, do you have a particular area of, like, that’s important, really important for you?
  • Design is important?
  • You would want more space, right, for the design?
  • Why do you want basic cleaning and not deluxe cleaning?
  • Okay, now what do you say about design?
  • I would say, would you like your staff to rest upstairs or not? Would you like to have an area for them, a storage area for yourself?
  • What about, you don’t want any office staff there?
  • They have to have the same skills, so why not hire them as a group?
  • Yes. What would you want?
  • Yeah?
  • Why do you want it to be relatively high? You’re an ice cream shop.
  • Okay, sure. Okay, tell me what temperature you would like then.
  • You want 75 degrees?
  • How should we share the cost? Because on some given days, like in summers, for example, I will have, there will be more people in the ice cream shop, right?
  • What do you want?
  • So Staff 6, Temp 3, Maintenance 3, oh, gosh, Design 1 or 6?
  • Six? And then W2?
  • Does that work better?
QUESTION - ASKING ABOUT WILLINGNESS TO CONCEDE
  • So if I compromise on staff, would you be willing to compromise on design?

The following might have been assigned different codes:

QUESTION - COULD ALSO BE "RESPONSE NEGATIVE" (SNIPING)
  • Do you not want your office staff to take a break?
  • Yeah, but how do I know that you’re telling me the truth right now? (RN?)
  • How am I supposed to trust you?
  • See how I’m compromising here?
  • We’re compromising. Will you please state that?
  • Okay, if you would only listen. If we’re sharing the space, right?
QUESTION - COULD ALSO BE "SUMMARIZE"
  • So you’re saying keep everything else the same and just switch those two to the most extreme?
  • Okay. So hire as a group and distribute according to demand for service?
  • Along with which design?
  • So basic cleaning service, equal split of costs?
QUESTION - COULD ALSO BE "PROCEDURE"
  • Okay. Do we have to discuss the website?
  • Okay. Okay. So are we done then?
  • So you have maintenance next?
  • So, I mean, do you want to go through each, like staff?
  • Done?
  • Do we have, where are we, do you have something to copy this down on or?
  • Or is it just on the recording? Yeah.
  • Are we done?
6

Description: Comments that provide information about self, preferences, reactions. Elaboration of a point.

INFORMATION - ABOUT OWN PREFERENCES
  • We don’t have to. It’s optional. It doesn’t really matter to me either way.
  • I would, I’d really like to renovate all of the upper level so that the customers can, I guess, like use that space, to have that entirely customer based space.
  • The most important thing for me is staff. Well, because you know I have specialized products that I need to.
  • For me, it goes my most important is staff, followed by maintenance, then design is my third. And if that’s your first, you know, we can do that then. Temperature and then this website.
  • Probably design and then temperature.
  • After that, maintenance, maintenance, staff and then the website.
  • And then in terms of design, mine would also be Option 6.
  • For, and then, okay, the last ones are temperature and maintenance and website.
  • I think the most important is probably in terms of design. Expressing priorities is very important for me.
INFORMATION - ABOUT OWN REACTIONS
  • I mean, I’m pretty pleased.
  • Okay, yeah, I really don’t like these options because they’re very, like, one or the other.
INFORMATION - ABOUT SELF AND ONE'S BUSINESS
  • I’m not sure if they are either. I mean, staffing is kind of important because of, like.
  • I have a bakery,
  • I feel like, no offense, it takes slightly more skill to make elaborate pastries and crap than it is to just make, like, ice cream.
  • For stuff and I guess my business is trying to branch out and to make, like, custom, like bakery items for people, and that would require more elaborate training.
INFORMATION - ELABORATION
  • Because they’ll be able to see how, like, multifaceted, like, our one little enterprise is.
  • I mean, it’s, I mean, there’s not a whole lot that’s difficult on it.
  • I don’t have a lot of details.
  • I am, I have confusing details about this. Essentially, like what, essentially what I have is that we are currently both getting delivery from two different sources.
  • On the counters. I don’t have any back storage.
  • We don’t use sour milk.

The following might have been assigned different codes:

INFORMATION - ABOUT WILLINGNESS TO CHANGE OFFER
  • Okay. Well, yeah, I’d be willing to, I could do Temperature 1 if.
  • If we were to combine delivery, I need the cold delivery anyway for the ice cream.
  • So essentially I would add your order to mine and then you would pay me directly.
  • Yeah, after whatever I pay.
  • And that sounds like a good idea as long as it is under $10,000. But I’m not sure, you know, how much.
  • And mine is it’s a good idea that, as long as it’s above $5,000. So, like, so we have, like, a margin between five and ten thousand dollars.
  • Well, here’s my deal.
  • Because I honestly do not know. I don’t know which one is, like, better, like numerically for you, but I’m going to look.
7

Description: Explanations for why the other should agree to an offer or request. The explanation might be based in claims of fairness, need, rights, or power (threat, toughness, alternatives)

SUBSTANTIATION - FAIRNESS
  • My proposal is, this is just based on the information I have that I thought would be fair for the both of us.
  • And the web, if the, if, if, if we change around maintenance, we’d probably have to change around the website, too, in order to maintain fairness.
  • SUBSTANTIATION - I REALLY NEED IT
    • They might eat it there, but, like, but ice cream you’re going to, like, sit there and, like, enjoy your ice cream and stuff.
    • You don’t need it, right, but I do need.
    • Right. But the thing is in my ice cream parlor, I have this place for people to sit and I have to serve them as well, while your bakery just wants to concentrate on ___, right? So I need to.
    • I don’t know that someone typically working an ice cream shift could do the cake design I need them to do.
    SUBSTANTIATION - POWER/THREAT
    • Right. So if you don’t want staffing, fine, you won’t have it.
    SUBSTANTIATION - MY RIGHTS
    • No. Why should you have a say in my staff? It’s my staff.
    • But I want my staff. I don’t want you to make any decision or train my staff.
8

Description: A positive responses, such as "great, yes ok" and any positive mimicing (such as following "yeah" with "yeah")

POSITIVE RESPONSES
  • Okay.
  • Yeah.
  • Yeah, it’s going.
  • Oh.
  • Yeah.
  • Mm-hmm.
  • That’s good with me.
  • Yeah.
  • Okay. That was hot. Okay.
  • I am not worried.
  • Great.
  • Yeah. I think that’s it.
  • Oh, right, right.
  • That’s fine.
  • Okay, I mean, I know we’re supposed to negotiate, but, like, that kind of sounds good to me.
  • Right, mm-hmm.
  • True, we can see.
  • Yeah, basically.
  • I’m happy.
  • I’m happy, too. I’m incredibly happy.
  • Me, too.
  • Well done, team.
  • Wow. All right.
  • Yep. Okay, stop.
9

Description: A negative response to a proposal or idea such as "no never no way" and any negative mimicing (repeating other's negative words)

RESPONSE NEGATIVE - STRONG NO
  • But I want some incentive to want to share the space. I don’t see any incentive if I have to hire individually.
  • Why staff? Well, then I’ll go find another ice cream shop, if that’s the case. You haven’t presented a good argument as to why we should share the staff.
  • No, it makes no sense to go for 4.
  • I don’t see the point of making customer people do baking stuff.
  • I don’t see why you want to train them individually when you can as well train them equally and then.
  • Look, look, look. Option 5.
  • I don’t see the point of hiring them individually and then training them together.
  • No, no, no, really.
  • Look, that is going to be very hard.
  • But if we’re training them jointly, it doesn’t make sense.
  • I don’t buy that.
  • I can’t.
  • No, no, no, no, no, no.
  • No, no, no. It’s not on the same floor. It’s not on the same floor.
  • No. No, no, I don’t want lesser levels of hygiene. I want deluxe cleaning as well.
  • Um, I’d like some time to think about it, if you don’t mind. I think it’s better off not discussing.
  • 25 percent for storage isn’t very much.
  • But you, but you pay more.
  • I don’t have any incentive.
RESPONSE NEGATIVE - SNIPING
  • We don’t keep old milk.
  • That’s exactly why.
  • Arguably so. Of course you do. I wouldn’t be surprised.
  • This is unacceptable. You are dissing my business.
  • You didn’t say this two minutes ago. You said two minutes ago that you sell rotten bread. I’m having second thoughts now.
    • I think. Still, I don’t see a reason as to why your brother should do it.
    • You’re not giving me a good argument.
    • You idiot.
    • It makes no sense.
    • But you, that’s why I’m saying you give them. Look.
    • You’re not giving me ___.
    • You’re not.
    • I’m not arguing with you.
    • You’re not out of business.
    • Yes, I am.
    • No, you’re not.
    • Fine.
    • Um.
    • No, you can’t not go for anything.
    • Well, I’m sorry, you’re being.
    • Oh my god.
    • But they’re, but you might.
    • You need to stop interrupting. Your face is so red.
    • No. I didn’t say that.
    • I’m highly offended. I think we need to cancel this contract.
    • You called my brother retarded.
    • Oh, my gosh.
10

Description: Summarizing and confirming what was said, decided, or known, such as "so you prefer x to y."

SUMMARIZE - SUMMARIZING AND CONFIRMING
  • Six, yes, which is hire and train individually, all decisions made by me.
  • Yeah, renovate all the upper level.
    • We chose Staff 1 on, yeah. No, Staff 6.
    • We said Staff 6.
    • Yeah, I think that that would work out best for us, just, you know, it’s right in the middle.
    • We’ll both be compromising the same distance.
    • Initially I said Website 2.
    • I was trying to kind of make it even with what we.
    • And so since we’ve discussed the other options, I mean, it sounds like we could be done.
    • I think we could be.
    • Mm-hmm. Okay, so let’s just confirm Design 6.
    • Maintenance 3.
    • Maintenance 3.
    • Temperature 3.
    • Temperature 3.
    • Staff 6.
    • And Website 2.
    • I’m just going to copy this. Temperature 1.
    • Maintenance 1.
    • Maintenance 1. Design 6.
    • And then Website 2.
    • Website 2. And then delivery 750, so cool.
    • Ice cream, mm-hmm.
    • You mean higher. Well, lower is better for you. Higher is better for me.
    • Higher is better for me, too.
11

Description: Comments about how to do the negotiation, such as "Can we move to a specific issue? Can we discuss one issue at a time? Can we move on to another issue? We’re running out of time." Includes comments about logistics.

PROCEDURE - PROCESS COMMENTS
  • I feel like we should be discussing, like that we should be taking longer. Whatever.
  • I didn’t really understand this when I was reading it.
  • I don’t fully understand how you got points with this, but.
  • I’m also, I mean, I guess we’re not supposed to talk about the point values, but I thought the point values they gave us were very strange.
  • I think we’re done.
  • This is all going? All right, let’s get some points.
  • All right, let’s do it.
  • So let’s talk about, of the five things, what.
  • Let’s just, let’s just go through, because this is the easiest way to, for us both to get the most we want because our priorities are not the same.
  • It’s, that’ll be the easiest way to get this done.
  • Okay. And then the last thing is the website.
  • I mean, we, yeah. That took all of, like, seven and a half minutes.
    • So staff first.
    • So maybe we should discuss them together.
    • Let me just see what this point table is, just to make sure. I’m pretty sure I’m fine, but. So, oh, wait. No, we do have to talk about website. The delivery is optional.
    • I should not have written that on that sheet. Hold on.
    • No, so I can.
    • Okay, let me do some math.
    • And because I’m lazy, it’s going to be done on my phone.
    • I’m like really, I’m lazy. I was like, calculator. I’m like, I can add these numbers in my head.
    • I guess that we’re done and we go into the main.
    • All right. So I’ve kind of put together a project proposal, if you want to listen to it.
    • I just, like, thought, like, what’s the minimum, what’s the maximum that I could get.
  • CHIT-CHAT, INTRODUCTIONS:
    • Hey there.
    • Hi.
    • I’m Elizabeth, by the way.
    • I’m Rachel.
    • Nice to meet you.
    • You, too. Let’s do this thing.
12

Description: Comments that do not fit into other codes, but are on-topic

MISCELLANEOUS - ODDS AND ENDS
  • So.
  • Um.
  • Okay, so that’s actually.
  • So I.
  • If he didn’t do it right we’d have to be like, dude, seriously. Those college kids.
  • Right, yeah. I feel, because I feel like if we start, like, messing with it too much, then, like, we’re just going to be, like, I don’t know.
  • What, okay. Let’s just.
  • Because, I mean, that’s.
  • Well, so.
  • I think that.
  • I mean, customers of both businesses, bla bla bla.
  • Which is the customer service.
  • We’re doing our customer. Look, look.
  • And we provide.
  • Because.
  • And then we.
  • You’re saying.
  • I am bringing my ___.
  • I think you are overestimating the power of it.
  • The 50 percent.
  • And then because.
  • Huh? Arguing ___, arguing ___.
  • So does my words on me. I said this two minutes ago.
  • Because we’re compromising.
  • What if I ever ___.
  • I can pass ___.
  • You, you.
  • Well, well.
13

Description: Comments not related in any way to the negotiation

OFF TOPIC
  • Oh, that’s kind of cool.
  • The red light is lit up.
  • I’m, like, kind of tired, so I’m like, ah, so much math.
  • I know, it made me, I live right by Andy’s.
  • The custard place. So that’s what I was thinking of.
  • Mm-hmm. They really have great stuff for big bite night.
  • Oh, my, I missed that completely.
  • They were, like, a couple of my friends and I were like, this is our last year, we’re doing it, and we got there at, like, 2:45 and got everything.
  • Oh, my gosh.
  • Like they had pumpkin. It was really, really delicious.
  • That sounds wonderful.
  • So I guess we’re not supposed to share this information, but I’m assuming.
  • Look at your face.
  • He is right here in the debate club. He has problems in debating with Cornell, but they do good in ___.
  • Are you telling me that debates are retarded? So you tell me two club activities that you’ve done in your life other than these two.
  • Okay, look. I don’t get why you’re bringing in personal information and all this stuff.
  • I don’t know why you’re getting personal.

Transcripts are coded in three steps:

1. Unitization (you need to do this): The model provides one code for each set of words or sentences that you identify as a unit in your Excel document. You can choose to have units be speaking turn, sentences, or thought units. The easiest to set up is speaking turns, since switching between speakers is clearly identifiable in transcripts. The next easiest is sentences, since they are identified by one of these symbols: .?! However, different transcribers may end sentences in different places. The hardest unit to create is the thought unit since that takes careful analysis and can represent as much work as the coding itself. (See the NegotiAct coding manual3 for how to create thought units.) Clarity of meaning runs the opposite direction. The longer the unit, the more likely there are multiple ideas in the unit, and less clarity for human or AI coders to know what part to code. Aslani et al (2014) coded speaking turns, but 72% of their speaking turns contained just one sentence. The closest alignment with the training data would be for you to use sentences as the unit.

2. Model Assigns Code: The model assigns a code to each unit you submit, based on in-context learning. Coding is guided by the prompt we developed and tested. For more on in-context learning see Xie and Min (2022). Our prompt for this model includes several elements:

  • Five fully coded transcripts. These transcripts were chosen from the 75 available transcripts in the following way. First, any combination of five was considered only if that set included all 13 codes. Second, five of those combinations were chosen at random to test. Third, the one that produced the highest level of match with human coders was retained.
  • Instructions to pay attention to who was speaking, such as “buyer” or “seller”.
  • Instructions to pay attention to what was said in the conversation before and after the unit being coded.
  • Supplementary instructions about the difference between “substantiation” and “information” since in early tests the model often coded substantiation as information, and vice versa. This confusion is not surprising since substantiation usually comes in the form of providing information, but with the purpose of supporting a specific offer or demand.
  • Additional examples of any codes where the five training transcripts did not contain at least 15 examples. We created enough additional examples (based on our understanding of the code) to bring the examples up to 15. We needed to add 12 examples of multi-issue offer, 12 examples of offer rejected, and 14 examples of Miscellaneous Off-Task.

3. We Run the Model Five Times: We automatically run the model five times, to assess consistency of results. As expected the results are not always the same, since with in-context learning the model learns anew with each run and may learn slightly differently each time. Variation is also expected since some units may reasonably be coded in several ways. By running the coding model five times, we get five codes assigned to each speaking unit. If three, four, or five of the five of the runs have the same code, we report the code and indicate the level of “consistency” of that code (three, four, or five out of five). If there are not at least three consistent results out of five runs, or if the model fails to assign a code, we do not report a model code. In these cases, the researcher needs to do human coding.

Validation occured in several steps:

Validation Step 1: Compare the model coding with humans by Aslani et al (2014). To do this, we asked the model to code the 4968 units contained in the Aslani et al (2014) transcripts that were not selected for training. We looked at several criteria.

  • Consistency: In our test, there was “perfect” consistency of model coding (five out of five runs of the model assigned the same code to a unit) for 4,752 of the units , “high” consistency (four out of five) for 126 of the units, “modest” consistency (three out of five) for two of the units , and 90 where the model did not report a code. Thus, 96% of codes had perfect consistency (see Table 1).

  • Match with human coding: We assessed whether the model assigned the same code as the human coders. The overall match level for units where the model assigned a code was 73%(95% CI: .72, .75). To ensure that the model wasnot biased by matching more accurately with the human coder in early or later phases of the negotiation, we tested whether the match level was different for coding the first versus second half of all transcripts. The match level was 74% for the first half and 72% or the second half, suggesting no bias based on phase of the negotiation. We also looked at match by level of consistency (see Table 1). These results suggest that users may want to accept model-assigned codes only for those codes where the model achieves perfect consistency (five out of five).

  • Table 1: Match Percentage by Consistency Level, Validation Step 1

    Level of Consistency Match with Human Codes % Achieve This Consistency Level Among Those Assigned a Code Number* Match Percentage Match
    Modest Consistency 3 out of 5 .04% 0 not match
    2 match 100%
    High Consistency 4 out of 5 2.5% 82 not match
    44 match 35%
    Perfect Consistency 5 out of 5 97.5% 1,239 not match
    3,513 match 74%

    *90 cases did not reach the 3 out of 5 consistency threshold or the model failed to assign a code

    We also calculated the Cohen’s kappa, with the model codes as coming from one rater and the human coding as coming from a second rater. This calculation, compared to the “percentage match,” accounts for matches that might occur based on chance. Cohen's kappa was calculated in R (R Core Team, 2022)4 using the IRR package (Gamer & Lemon, 2019)5. Cohen's kappa was equal to 0.69, with the no information rate of 0.27 (p-value of difference is <.001). According to Landis and Koch (1977)6 this represents "substantial agreement", and according to Fleiss (1981)7 is "fair to good" agreement. Rather than relying on conventional categorical guidelines to interpret the magnitude of kappa, Bakeman (2023) argues that researchers should estimate observer accuracy or how accurate simulated observers need to be to produce a given value of kappa. The KappaAcc program (Bakeman, 2022)8 was used to estimate observer accuracy, which was found to be 85%.

  • Summary Data and Confusion Matrix: We created a confusion matrix for all codes with perfect consistency (see Table2). The vertical axis shows human coding. The horizontal axis shows model coding. Also included below (see Table 2) are summary statistics showing which codes appeared most frequently in the human coding (Positive Response was most common representing 25.37% of the codes, while Miscellaneous Off-Topic was least common representing just .25% of the codes), and the human-model match level for each code. The highest level of human-model match was for Positive Response and Question while the lowest was for Miscellaneous Off Topic, Negative Responses, and Summarize. There appears to be a rough correlation between number of units and match percentage, suggesting that match percentage goes up when there are more examples of a code in the training transcript for the model to learn from and when there are more opportunities to find that code in the test transcripts. Miscellaneous Off Topic has the very lowest match percentage, perhaps because there are only ten of these units in the test transcripts, because it is a more ambiguous category, and/or because it was less conceptually central to Aslani et al. (2014) and their coders.

    In terms of absolute numbers of mismatches, the largest set is 124 human-coded Information codes that were coded as Substantiation by the model. This is an issue we recognized early in our testing, which resulted in added instructions in the prompt to reduce this mismatch. The fundamental problem is that Substantiation is often achieved by providing information, but to be Substantiation that information must support a particular argument or claim. There were also 81 cases of human-coded Substantiation that were coded as Information by the model. The next largest set of mismatches were 81 where humans assigned a code of Positive Responses while the model assigned a code of Accepting Offer, which is easy to imagine happening.

Table 2: Confusion Matrix, Validation Step 1

Not Available!

Table 3: Match Percentage by Code, Validation Step 1

Human Code % of units Across All Transcripts Model Match %
RP 25.37 88.45%
I 18.79 66.96%
Q 18.46 86.46%
OS 8.59 70.41%
PROC 7.13 57.76%
S 5.64 59.27%
SUM 5.49 36.94%
OA 4.28 69.38%
MI 3.79 63.78%
RN 1.33 40.00%
OM 0.61 60.00%
OR 0.27 38.46%
OT 0.25 8.33%
Closer Look at Mismatches: To assess the nature of these and other mismatches, we selected a random sample of 100 mismatches for closer examination. Given that original human coders may be just as likely to make errors (or simply vary in their judgments) as the model, we wanted to see if newly trained coders would see the Aslani-provided codes or the model-provided codes as more accurate. We trained two coders, who practiced coding transcripts until they reached a high level of agreement (kappa=.81). Then we provided these coders with the 100 speaking turns, as well as the two speaking turns preceding that speaking turn, along with the human and model codes. They were not informed which code came from the model or humans, and the order in which they saw the two codes was flipped halfway through the 100 samples to avoid order effects. They selected which code they saw as more accurate. This was done first separately by the two coders, and then they were asked to resolve through discussion any cases where they disagreed. In the end, these new coders thought the model-provided codes were correct 68% of the time and the human-generated codes 32% of the time. Based on this we can expect that the model is correct in 68% of the cases with mismatches, so we can trust that about 91% of the model codes are accurate.

Validation Step 2: Match with Human coding for Different Simulations.

The first step of validation involved matching human and model codes where the negotiation simulation used for training was the same as the negotiation simulation used for testing the model (The Sweet Shop). But users may have transcripts from any number of simulations or real-world negotiations, not just the simulation used in the Aslani et al (2014) study. Therefore, we wanted to test how well the model would match human coders who applied the Aslani et al model to transcripts using other simulations. We selected a set of 6 transcripts from a study that used the Cartoon simulation, and 6 transcripts from a study that used the Les Florets simulation. Since these transcripts were not initially coded using the Aslani codes, we needed to train two coders to use the Aslani codes. After initial training, they reached a level of inter-coder reliability of k=.81. They coded the transcripts separately and came together to discuss any cases where they disagreed and assign a code. This provided the human codes for a set of Cartoon and LesFloret simulations. These transcripts were then coded using our model.

The 12 transcripts had 2711 speaking turns, of which 2679 were single sentences. The model had perfect consistency for 88% of the speaking turns, high consistency for 10% of the speaking turns, and modest consistency for .02% of the speaking turns. There were 44 cases of less then 3 out of 5 consistency. The match percentage was 72% for high consistency codes, and 65% for perfect consistency codes (see Table 4). Overall, the match percentage was 65% (95% CI: .63, .67). This was lower than our prior tests, as expected, because these transcripts did not have the same issues and topics as training transcripts (which used The Sweet Shop simulation). For that reason, these results may better represent the model’s effectiveness with most transcripts. We also checked to see if one set of transcripts did better than the other. The match percentage was also 64.90% for just the Les Florets transcripts and 65.16% for just the Cartoon transcripts, suggesting that the model should do just as well with transcripts using different simulations.

We also calculated the Cohen’s kappa. The weighted Cohen's kappa was .56 (95% CI: .63, .67) with the no information rate of .44 (p-value of difference is < .001). This kappa according to Landis and Koch (1977) is "moderate agreement", and according to Fleiss (1981) is "fair to good" agreement. Rather than relying on conventional categorical guidelines to interpret the magnitude of kappa, Bakeman (2023) argues that researchers should estimate observer accuracy or how accurate simulated observers need to be to produce a given value of kappa. The KappaAcc program (Bakeman, 2022) was used to estimate observer accuracy, which was found to be 79%.

Table 4: Match Percentage by Consistency Level, Validation Step 2

Level of Consistency Match with Human Codes % Achieve This Consistency Level Among Those Assigned a Code Number* Match Percentage Match
Modest Consistency 3 out of 5 .02% 34 not match
22 match 39%
High Consistency 4 out of 5 10% 75 not match
200 match 72%
Perfect Consistency 5 out of 5 88% 839 not match
1541 match 65%

*44 cases did not reach the 3 out of 5 consistency threshold or the model failed to assign a code

The proportion of speaking units that fell into each category were roughly similar to what we saw in the first validation tests, with most speaking units being: Information, Question, and Response Positive. In this set of transcripts Substantiation was also fairly common (see Table 6). As with the first validation test, model-human match percentage appears to be highly correlated with number of codes.

The confusion matrix (see Table 5) shows that, once again, the largest number of mismatches comes from Information/Substantiation. It also shows that nearly all of the mismatches were cases where the model assigned a code of “information” when the humans assigned various other codes.

Table 5: Confusion Matrix, Validation Step 2

Not Available!

Table 6: Match Percentage by Code, Validation Step 2

Human Code % of units Across All Transcripts Model Match %
I 28.49 79.04%
S 15.39 52.77%
Q 14.32 92.75%
RP 13.24 83.75%
PROC 9.09 28.16%
MI 5.79 32.94%
OS 3.67 60.61%
SUM 3.26 52.27%
RN 2.52 14.70%
OM 2.30 35.48%
OR 1.26 32.35%
OA 0.41 45.45%
OT 0.26 14.28%

In order to assess the mismatches, we collected a random sample of 100 sentences with mismatches, along with the two prior sentences and the human and model codes. We then took off the column labels and randomly mixed the order of the codes. Since the human coding in this case was done by our coding team, we wanted a different person to select which of the two codes was more correct. This was done by the first author. The results are shown in Table 7. About half of the time the human code was deemed accurate, but also in about half of the cases either the model was deemed accurate, or both the model and human code selections were feasible. Sometimes this was because different parts of the sentence focused on different things, or it was unclear if (for example) “information” provided was just an expression of the speaker’s priorities, or a way to back up their demands (“substantiation”). Looking, then, at the 31% of speaking units that were mismatch, perhaps half of them might still be deemed accurate.

Table 7: Assessment of 100 Sample Mismatches

Code Selection Count
Clear Choice Human Code is Correct 47 68
Model Code is Correct 21
Both Correct Human Code is Correct but Both are Feasible 11 24
Model Code is Correct but Both are Feasible 8
Both are Equally Correct 5
Both Incorrect Human and Model Both Correct 6 6
Not Understood Could not Understand the Sentence 2 2

Set up your transcripts for analysis by putting them into an excel sheet. Files must not be longer than 999 rows (if you have longer transcripts, split them to make smaller files). The format should be as shown below. Label the first column “SpeakerName” and list whatever names you have for those speakers (e.g., buyer/seller, John/Mary). Label the second column “Content” and include the material that is contained in your unit of analysis (which may be a speaking turn, a sentence, or a thought unit). Also include columns for "ResearcherName", "Email", and "Institution" (often a university) and include that information in the next row. Note that there is no space in the headings “SpeakerName” and “ResearcherName.”

If you use speaking turns then speakers will alternate, and the format will look like this:

SpeakerName Content ResearcherName Email Institution
Buyer Words in a speaking turn… Your Name Your Email Your Institution
Seller Words in a speaking turn…
Buyer Words in a speaking turn…
Seller Words in a speaking turn…
etc. Words in a speaking turn…

If you use sentences or thought units then it is possible that speakers may appear several times in a row, and the format will look like this:

SpeakerName Content ResearcherName Email Institution
Buyer Words in sentence or thought unit… Your Name Your Email Your Institution
Seller Words in sentence or thought unit…
Seller Words in sentence or thought unit…
Seller Words in sentence or thought unit…
Buyer Words in sentence or thought unit…
Buyer Words in sentence or thought unit…
Seller Words in sentence or thought unit…
etc. Words in sentence or thought unit…

Create one Excel file for each transcript. Name each file in the following way:

  • YourName_StudyName_1
  • YourName_StudyName_2
  • YourName_StudyName_3
  • etc.

For example, my first file would be named “RayFriedman_CrownStudy_1” and the second file would be named “RayFriedman_CrownStudy_2”, and so on.

To submit your transcript for the model to code, drag and drop one or several transcript files into the section below. If you see the files you want to code listed properly (just below the “Upload” button), click submit. Note that each time you upload new files, those will replace the previously uploaded files, and be ready to submit

It will likely take about 10 minutes for Claude to process each transcript, although this can vary based on how much demand Claude has at the moment you submit your files. Do not close your window while you are waiting for results – you will lose your results. Once the analysis for each transcript is complete, you will receive the output in a csv file that is automatically downloaded to your download folder. We suggest submitting just a few files at a time, so that you can check the output before doing too many analyses. The output file will include:

  • Transcript Name
  • Speaker
  • The text (thought unit, sentence, or speaking turn)
  • The code assigned to that text
  • Consistency score for that code
  • 1 This project was later expanded and published (but did not use the coding) as Aslani, S., Ramirez‐Marin, J., Brett, J., Yao, J., Semnani‐Azad, Z., Zhang, Z. X., ... & Adair, W. (2016). Dignity, face, and honor cultures: A study of negotiation strategy and outcomes in three cultures. Journal of Organizational Behavior, 37(8), 1178-1201.
  • 2 Weingart, L. R., Thompson, L. L., Bazerman, M. H., & Carroll, J. S. (1990). Tactical behavior and negotiation outcomes. International Journal of Conflict Management, 1, 7-31; Gunia, B. C., Brett, J. M., Nandkeolyar, A. K., & Kamdar, D. (2011). Paying a price: Culture, trust, and negotiation consequences. Journal of Applied Psychology, 96, 774-789; Adair, W. L., & Brett, J. M. (2005). The negotiation dance: Time, culture, and behavioral sequences in negotiation. Organization Science, 16, 33-51.
  • 3 In the supplementary file for: Jackel, E., Zerres, A., Hamshorn de Sanchez, C., Lehmann-Willenbrock, & N., Huffmeier, J. (2022), “NegotiAct: Introducing a comprehensive coding scheme to capture temporal interaction patterns in negotiations,” Group and Organization Management.
  • 4 R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  • 5 Gamer, M., Lemon, J., Fellows, I., & Singh P. (2019) irr: Various coefficients of interrater reliability and agreement. R package version 0.84.1. https://CRAN.R-project.org/package=irr.
  • 6 Landis, J.R.; Koch, G.G. (1977). "The measurement of observer agreement for categorical data". Biometrics. 33 (1): 159–174. doi:10.2307/2529310.
  • 7 Fleiss, J.L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: John Wiley. ISBN 978-0-471-26370-8.
  • 8 Bakeman, R. (2022). KappaAcc: A program for assessing the adequacy of kappa. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01836-1