Model 1 uses the coding scheme of Aslani, Ramirez-Martin, Brett, Yao, Semnani-Azada, Zhang, Tinsley, Weingart, and Adair (2014), “Measuring negotiation strategy and predicting outcomes: Self-reports, behavioral codes, and linguistic codes,” presented at the annual conference of the International Association for Conflict Management, Leiden, The Netherlands.1 Click here for full paper
As the authors describe in the 2014 paper, “we developed a 14-item code based on prior negotiation coding schemes (e.g. Adair & Brett, 2005; Gunia et al., 2011; Weingart et al., 1990)2 to measure participants’ use of tactics. Major categories in the code were information, offers, substantiation, negative and positive reactions, and a miscellaneous category.” The simulation used in their study was The Sweet Shop (newnegotiationexercises.com). The authors shared with us 75 transcripts they had collected and coded using human coders.
Each of the codes is shown below except for “Response Embarrassed” which appeared only once across all transcripts, making it hard to teach that code to the model. For each of the remaining 13 codes, we provide a definition of the code, a short explanation, and sample sentences from the transcripts. The sample sentences let you see how these scholars operationalized their codes, which is what Model 1 learned from and tries to reproduce when coding your transcripts. As with any coding scheme, different scholars might operationalize concepts slightly differently. You should decide if this coding scheme will be useful to you by reviewing how the authors used it.
When reporting your results from this model, please cite this paper:
Below we have listed each of the codes used in this coding scheme (If you prefer an excel file format to view this material, download this file.), along with a brief definition and explanation, followed by example sentences from the coded transcripts. Each of the 13 codes is shown below. The purpose of providing these sample sentences is to let you see how these scholars had their coders operationalize the codes. This way of thinking is what Model 1 has learned and is trained to reproduce.
NOTE: Bold border surrounding several sentences indicates those sentences are an ongoing conversation
Code Number | Code Name |
---|---|
1 |
Single Issue Offer (OS)
Description: Making a offer regarding one issue in the negotiation, such as "I'll give you Staff option 3." This code includes stating a preference, arguing for a position, expressing a want, making a suggestion to do something, and giving a concession. SINGLE ISSUE OFFER - STATING YOUR OWN OFFER OR PREFERENCES
SINGLE ISSUE OFFER - DEMANDING/PUSHING YOUR OFFER
|
2 |
Multi Issue Offer (OM)
Description: Making an offer that includes at least two different issues, such as "I'll give you staff 1 and Design 4" or suggesting a tradeoff between issues. MULTI ISSUE OFFER - OFFER INCLUDES SEVERAL ISSUES
|
3 |
Offer Accept (OA)
Description: Accepting an offer on one or more issues, such as "OK, staff 1" ACCEPT AN OFFER
|
4 |
Offer Reject (OR)
Description: Clear rejection of an offer, such as "no way staff 1 and design 4" REJECT AN OFFER
|
5 |
Question (Q)
Description: Asking a question about the other side's preferences or statements. QUESTION - ASKING CLARRIFICATION
QUESTION - ASKING PREFERENCES
QUESTION - ASKING ABOUT WILLINGNESS TO CONCEDE
The following might have been assigned different codes: QUESTION - COULD ALSO BE "RESPONSE NEGATIVE" (SNIPING)
QUESTION - COULD ALSO BE "SUMMARIZE"
QUESTION - COULD ALSO BE "PROCEDURE"
|
6 |
Information (I)
Description: Comments that provide information about self, preferences, reactions. Elaboration of a point. INFORMATION - ABOUT OWN PREFERENCES
INFORMATION - ABOUT OWN REACTIONS
INFORMATION - ABOUT SELF AND ONE'S BUSINESS
INFORMATION - ELABORATION
The following might have been assigned different codes: INFORMATION - ABOUT WILLINGNESS TO CHANGE OFFER
|
7 |
Substantiation (S)
Description: Explanations for why the other should agree to an offer or request. The explanation might be based in claims of fairness, need, rights, or power (threat, toughness, alternatives) SUBSTANTIATION - FAIRNESS
SUBSTANTIATION - I REALLY NEED ITSUBSTANTIATION - POWER/THREATSUBSTANTIATION - MY RIGHTS |
8 |
Response Positive (RP)
Description: A positive responses, such as "great, yes ok" and any positive mimicing (such as following "yeah" with "yeah") POSITIVE RESPONSES
|
9 |
Response Negative (RN)
Description: A negative response to a proposal or idea such as "no never no way" and any negative mimicing (repeating other's negative words) RESPONSE NEGATIVE - STRONG NO
RESPONSE NEGATIVE - SNIPING
|
10 |
Summarize (SUM)
Description: Summarizing and confirming what was said, decided, or known, such as "so you prefer x to y." SUMMARIZE - SUMMARIZING AND CONFIRMING
|
11 |
Procedure (PROC)
Description: Comments about how to do the negotiation, such as "Can we move to a specific issue? Can we discuss one issue at a time? Can we move on to another issue? We’re running out of time." Includes comments about logistics. PROCEDURE - PROCESS COMMENTS
|
12 |
Miscellaneous (MI)
Description: Comments that do not fit into other codes, but are on-topic MISCELLANEOUS - ODDS AND ENDS
|
13 |
Off Topic (OT)
Description: Comments not related in any way to the negotiation OFF TOPIC
|
Transcripts are coded in three steps:
1. Unitization (you need to do this): The model provides one code for each set of words or sentences that you identify as a unit in your Excel document. You can choose to have units be speaking turn, sentences, or thought units. The easiest to set up is speaking turns, since switching between speakers is clearly identifiable in transcripts. The next easiest is sentences, since they are identified by one of these symbols: .?! However, different transcribers may end sentences in different places. The hardest unit to create is the thought unit since that takes careful analysis and can represent as much work as the coding itself. (See the NegotiAct coding manual3 for how to create thought units.) Clarity of meaning runs the opposite direction. The longer the unit, the more likely there are multiple ideas in the unit, and less clarity for human or AI coders to know what part to code. Aslani et al (2014) coded speaking turns, but 72% of their speaking turns contained just one sentence. The closest alignment with the training data would be for you to use sentences as the unit.
2. Model Assigns Code: The model assigns a code to each unit you submit, based on in-context learning. Coding is guided by the prompt we developed and tested. For more on in-context learning see Xie and Min (2022). Our prompt for this model includes several elements:
3. We Run the Model Five Times: We automatically run the model five times, to assess consistency of results. As expected the results are not always the same, since with in-context learning the model learns anew with each run and may learn slightly differently each time. Variation is also expected since some units may reasonably be coded in several ways. By running the coding model five times, we get five codes assigned to each speaking unit. If three, four, or five of the five of the runs have the same code, we report the code and indicate the level of “consistency” of that code (three, four, or five out of five). If there are not at least three consistent results out of five runs, or if the model fails to assign a code, we do not report a model code. In these cases, the researcher needs to do human coding.
Validation occured in several steps:
Validation Step 1: Compare the model coding with humans by Aslani et al (2014). To do this, we asked the model to code the 4968 units contained in the Aslani et al (2014) transcripts that were not selected for training. We looked at several criteria.
Table 1: Match Percentage by Consistency Level, Validation Step 1
Level of Consistency | Match with Human Codes | % Achieve This Consistency Level Among Those Assigned a Code | Number* | Match | Percentage Match |
---|---|---|---|---|---|
Modest Consistency | 3 out of 5 | .04% | 0 | not match | |
2 | match | 100% | |||
High Consistency | 4 out of 5 | 2.5% | 82 | not match | |
44 | match | 35% | |||
Perfect Consistency | 5 out of 5 | 97.5% | 1,239 | not match | |
3,513 | match | 74% |
*90 cases did not reach the 3 out of 5 consistency threshold or the model failed to assign a code
We also calculated the Cohen’s kappa, with the model codes as coming from one rater and the human coding as coming from a second rater. This calculation, compared to the “percentage match,” accounts for matches that might occur based on chance. Cohen's kappa was calculated in R (R Core Team, 2022)4 using the IRR package (Gamer & Lemon, 2019)5. Cohen's kappa was equal to 0.69, with the no information rate of 0.27 (p-value of difference is <.001). According to Landis and Koch (1977)6 this represents "substantial agreement", and according to Fleiss (1981)7 is "fair to good" agreement. Rather than relying on conventional categorical guidelines to interpret the magnitude of kappa, Bakeman (2023) argues that researchers should estimate observer accuracy or how accurate simulated observers need to be to produce a given value of kappa. The KappaAcc program (Bakeman, 2022)8 was used to estimate observer accuracy, which was found to be 85%.
In terms of absolute numbers of mismatches, the largest set is 124 human-coded Information codes that were coded as Substantiation by the model. This is an issue we recognized early in our testing, which resulted in added instructions in the prompt to reduce this mismatch. The fundamental problem is that Substantiation is often achieved by providing information, but to be Substantiation that information must support a particular argument or claim. There were also 81 cases of human-coded Substantiation that were coded as Information by the model. The next largest set of mismatches were 81 where humans assigned a code of Positive Responses while the model assigned a code of Accepting Offer, which is easy to imagine happening.
Table 2: Confusion Matrix, Validation Step 1
Table 3: Match Percentage by Code, Validation Step 1
Human Code | % of units Across All Transcripts | Model Match % |
---|---|---|
RP | 25.37 | 88.45% |
I | 18.79 | 66.96% |
Q | 18.46 | 86.46% |
OS | 8.59 | 70.41% |
PROC | 7.13 | 57.76% |
S | 5.64 | 59.27% |
SUM | 5.49 | 36.94% |
OA | 4.28 | 69.38% |
MI | 3.79 | 63.78% |
RN | 1.33 | 40.00% |
OM | 0.61 | 60.00% |
OR | 0.27 | 38.46% |
OT | 0.25 | 8.33% |
Validation Step 2: Match with Human coding for Different Simulations.
The first step of validation involved matching human and model codes where the negotiation simulation used for training was the same as the negotiation simulation used for testing the model (The Sweet Shop). But users may have transcripts from any number of simulations or real-world negotiations, not just the simulation used in the Aslani et al (2014) study. Therefore, we wanted to test how well the model would match human coders who applied the Aslani et al model to transcripts using other simulations. We selected a set of 6 transcripts from a study that used the Cartoon simulation, and 6 transcripts from a study that used the Les Florets simulation. Since these transcripts were not initially coded using the Aslani codes, we needed to train two coders to use the Aslani codes. After initial training, they reached a level of inter-coder reliability of k=.81. They coded the transcripts separately and came together to discuss any cases where they disagreed and assign a code. This provided the human codes for a set of Cartoon and LesFloret simulations. These transcripts were then coded using our model.
The 12 transcripts had 2711 speaking turns, of which 2679 were single sentences. The model had perfect consistency for 88% of the speaking turns, high consistency for 10% of the speaking turns, and modest consistency for .02% of the speaking turns. There were 44 cases of less then 3 out of 5 consistency. The match percentage was 72% for high consistency codes, and 65% for perfect consistency codes (see Table 4). Overall, the match percentage was 65% (95% CI: .63, .67). This was lower than our prior tests, as expected, because these transcripts did not have the same issues and topics as training transcripts (which used The Sweet Shop simulation). For that reason, these results may better represent the model’s effectiveness with most transcripts. We also checked to see if one set of transcripts did better than the other. The match percentage was also 64.90% for just the Les Florets transcripts and 65.16% for just the Cartoon transcripts, suggesting that the model should do just as well with transcripts using different simulations.
We also calculated the Cohen’s kappa. The weighted Cohen's kappa was .56 (95% CI: .63, .67) with the no information rate of .44 (p-value of difference is < .001). This kappa according to Landis and Koch (1977) is "moderate agreement", and according to Fleiss (1981) is "fair to good" agreement. Rather than relying on conventional categorical guidelines to interpret the magnitude of kappa, Bakeman (2023) argues that researchers should estimate observer accuracy or how accurate simulated observers need to be to produce a given value of kappa. The KappaAcc program (Bakeman, 2022) was used to estimate observer accuracy, which was found to be 79%.
Table 4: Match Percentage by Consistency Level, Validation Step 2
Level of Consistency | Match with Human Codes | % Achieve This Consistency Level Among Those Assigned a Code | Number* | Match | Percentage Match |
---|---|---|---|---|---|
Modest Consistency | 3 out of 5 | .02% | 34 | not match | |
22 | match | 39% | |||
High Consistency | 4 out of 5 | 10% | 75 | not match | |
200 | match | 72% | |||
Perfect Consistency | 5 out of 5 | 88% | 839 | not match | |
1541 | match | 65% |
*44 cases did not reach the 3 out of 5 consistency threshold or the model failed to assign a code
The proportion of speaking units that fell into each category were roughly similar to what we saw in the first validation tests, with most speaking units being: Information, Question, and Response Positive. In this set of transcripts Substantiation was also fairly common (see Table 6). As with the first validation test, model-human match percentage appears to be highly correlated with number of codes.
The confusion matrix (see Table 5) shows that, once again, the largest number of mismatches comes from Information/Substantiation. It also shows that nearly all of the mismatches were cases where the model assigned a code of “information” when the humans assigned various other codes.
Table 5: Confusion Matrix, Validation Step 2
Table 6: Match Percentage by Code, Validation Step 2
Human Code | % of units Across All Transcripts | Model Match % |
---|---|---|
I | 28.49 | 79.04% |
S | 15.39 | 52.77% |
Q | 14.32 | 92.75% |
RP | 13.24 | 83.75% |
PROC | 9.09 | 28.16% |
MI | 5.79 | 32.94% |
OS | 3.67 | 60.61% |
SUM | 3.26 | 52.27% |
RN | 2.52 | 14.70% |
OM | 2.30 | 35.48% |
OR | 1.26 | 32.35% |
OA | 0.41 | 45.45% |
OT | 0.26 | 14.28% |
In order to assess the mismatches, we collected a random sample of 100 sentences with mismatches, along with the two prior sentences and the human and model codes. We then took off the column labels and randomly mixed the order of the codes. Since the human coding in this case was done by our coding team, we wanted a different person to select which of the two codes was more correct. This was done by the first author. The results are shown in Table 7. About half of the time the human code was deemed accurate, but also in about half of the cases either the model was deemed accurate, or both the model and human code selections were feasible. Sometimes this was because different parts of the sentence focused on different things, or it was unclear if (for example) “information” provided was just an expression of the speaker’s priorities, or a way to back up their demands (“substantiation”). Looking, then, at the 31% of speaking units that were mismatch, perhaps half of them might still be deemed accurate.
Table 7: Assessment of 100 Sample Mismatches
Code Selection | Count | ||
---|---|---|---|
Clear Choice | Human Code is Correct | 47 | 68 |
Model Code is Correct | 21 | ||
Both Correct | Human Code is Correct but Both are Feasible | 11 | 24 |
Model Code is Correct but Both are Feasible | 8 | ||
Both are Equally Correct | 5 | ||
Both Incorrect | Human and Model Both Correct | 6 | 6 |
Not Understood | Could not Understand the Sentence | 2 | 2 |
Set up your transcripts for analysis by putting them into an excel sheet. Files must not be longer than 999 rows (if you have longer transcripts, split them to make smaller files). The format should be as shown below. Label the first column “SpeakerName” and list whatever names you have for those speakers (e.g., buyer/seller, John/Mary). Label the second column “Content” and include the material that is contained in your unit of analysis (which may be a speaking turn, a sentence, or a thought unit). Also include columns for "ResearcherName", "Email", and "Institution" (often a university) and include that information in the next row. Note that there is no space in the headings “SpeakerName” and “ResearcherName.”
If you use speaking turns then speakers will alternate, and the format will look like this:
SpeakerName | Content | ResearcherName | Institution | |
---|---|---|---|---|
Buyer | Words in a speaking turn… | Your Name | Your Email | Your Institution |
Seller | Words in a speaking turn… | |||
Buyer | Words in a speaking turn… | |||
Seller | Words in a speaking turn… | |||
etc. | Words in a speaking turn… |
If you use sentences or thought units then it is possible that speakers may appear several times in a row, and the format will look like this:
SpeakerName | Content | ResearcherName | Institution | |
---|---|---|---|---|
Buyer | Words in sentence or thought unit… | Your Name | Your Email | Your Institution |
Seller | Words in sentence or thought unit… | |||
Seller | Words in sentence or thought unit… | |||
Seller | Words in sentence or thought unit… | |||
Buyer | Words in sentence or thought unit… | |||
Buyer | Words in sentence or thought unit… | |||
Seller | Words in sentence or thought unit… | |||
etc. | Words in sentence or thought unit… |
Create one Excel file for each transcript. Name each file in the following way:
For example, my first file would be named “RayFriedman_CrownStudy_1” and the second file would be named “RayFriedman_CrownStudy_2”, and so on.