Incidence and Malignancy Rates of Diagnoses in the Bethesda System for Reporting Thyroid Aspiration Cytology: An Institutional Experience
Article information
Abstract
Background
The Bethesda System for Reporting Thyroid Cytopathology (BSRTC) uses six diagnostic categories to standardize communication of thyroid fine-needle aspiration (FNA) interpretations between clinicians and cytopathologists. Since several studies have questioned the diagnostic accuracy of this system, we examined its accuracy in our hospital.
Methods
We calculated the incidences and malignancy rates of each diagnostic category in the BSRTC for 1,730 FNAs that were interpreted by four cytopathologists in Gangnam Severance Hospital between October 1, 2011, and December 31, 2011.
Results
The diagnostic incidences of categories I-VI were as follows: 13.3%, 40.6%, 9.1%, 0.4%, 19.3%, and 17.3%, respectively. Similarly, the malignancy rates of these categories were as follows: 35.3%, 5.6%, 69.0%, 50.0%, 98.7%, and 98.9%, respectively. In categories II, V, and VI, there were no statistically significant differences in the ranges of the malignancy rates among the four cytopathologists. However, there were significant differences in the ranges for categories I and III.
Conclusions
Our findings suggest that institutions that use the BSRTC should regularly update their diagnostic criteria. We also propose that institutions issue an annual report of incidences and malignancy rates to help other clinicians improve the case management of patients with thyroid nodules.
The Bethesda System for Reporting Thyroid Cytopathology (BSRTC) was developed in 2008 to facilitate more accurate communication of thyroid fine-needle aspiration (FNA) interpretations between clinicians and cytopathologists.1 This system, which we adopted in 2010, classifies FNA results into six general diagnostic categories, namely, I) nondiagnostic or unsatisfactory, II) benign, III) atypia of undetermined significance or follicular lesion of undetermined significance, IV) follicular neoplasm or suspicious for a follicular neoplasm, V) suspicious for malignancy, and VI) malignant.1 Each of these categories is associated with a risk of malignancy as follows: I) 1-4%, II) 0-3%, III) 5-15%, IV) 15-30%, V) 60-75%, and VI) 97-99%.1 Although this system is useful, some studies that have investigated these risks have had controversial results, especially for category III. Furthermore, other studies have reported differences in malignancy rates among different institutions and have offered various explanations.
In this study, we determined the distribution of FNA results and malignancy rates in each diagnostic category of the BSRTC in our hospital to determine whether our cytopathologists are using this system properly. Specifically, by analyzing data from individual cytopathologists, we hoped to ascertain whether our hospital has consistent FNA results in each diagnostic category. Finally, we suggest ways to improve the accuracy of classifying FNA diagnoses.
MATERIALS AND METHODS
Thyroid FNA cytology cases
We retrospectively analyzed 1,538 patients who had thyroid nodules that were diagnosed by FNA between October 1, 2011, and December 31, 2011, in Gangnam Severance Hospital in Korea. This study met criteria for exemption from review from the institutional review board. Each FNA diagnosis was made independently by one of four cytopathologists. Each thyroid aspiration sample was analyzed by using liquid-based preparation or conventional smear. Some FNAs were originally performed by other hospitals; however, in these cases, the slides were re-evaluated by our cytopathologists. If a case was diagnosed by multiple FNA procedures, then we only considered the last diagnosis in the period of 2011. If a patient had more than two thyroid nodules, then we considered each FNA diagnosis as a separate case.
Follow-up cases
Among 1,538 patients, we included 1,383 patients who had follow-up data in the same hospital after diagnosis of their thyroid nodules by FNA. The criteria for considering the follow-up data were at least one additional FNA, sonography of the thyroid or thyroid surgery from the date of initial FNA to December 31, 2013. For follow-up cases that were classified as category IV, V, or VI, we only considered the histologic diagnosis of surgical cases. However, for follow-up cases that were classified as category I, II, or III, we considered both the histologic diagnosis of surgical cases (if applicable) and the most recent FNA diagnosis during the follow-up period. The most recent FNA diagnoses that were classified as category I, II, or III were considered to be benign.
Statistical analysis
To determine the sensitivity, specificity, false negative rate, false positive rate, positive predictive value, and negative predictive value of the malignancy rate of individual cytopathologists, we divided the BSRTC categories into two groups: 1) surgery is not recommended (categories I, II, and III) because the suggested malignancy risks of these categories are low, and 2) surgery is recommended (categories IV, V, and VI) because the suggested malignancy risks of these categories are high. The cut-off value for a malignant diagnosis by each cytopathologist was calculated by using a receiver operating characteristic (ROC) curve. ROC curve analysis was performed with MedCalc Statistical Software ver. 12.7.5 (MedCalc Software, Ostend, Belgium). p-values less than .05 were considered to be statistically significant.
RESULTS
Patients and distribution of diagnostic categories in all cases
In this study, we examined 1,538 patients who were 14 to 86 years old (mean, 50 years). The ratio of females to males was 3.8. Among these patients, 201 patients had more than two thyroid nodules, which resulted in a total of 1,730 FNA cases.
As shown in Table 1, the distribution of all cases in the six BSRTC diagnostic categories were as follows: 230 cases (13.3%) of category I, 702 cases (40.6%) of category II, 157 cases (9.1%) of category III, 7 cases (0.4%) of category IV, 335 cases (19.3%) of category V, and 299 cases (17.3%) of category VI. The total number and distributions of all cases that were analyzed by each cytopathologist is shown in Table 1.
Patients and distribution of outcomes in follow-up cases
The patients who met the criteria for follow-up was 1,383. Among these 1,383 patients, 125 had more than two thyroid nodules, rendering a total of 1,547 cases which met the follow-up criteria. Of these 1,547 cases, 213 cases were examined by at least one additional FNA, 485 cases were examined by sonography and the remaining 849 cases were noted by surgery after the initial FNA. The ages of these patients ranged from 14 to 86 years old (mean, 49 years) and the ratio of females to males was 4.0. Their follow-up periods ranged from 2 days to 2 years and 2 months (median, 150 days).
The distributions of follow-up diagnoses for each initial BSRTC diagnostic classification are shown in Table 2. Specifically, category I diagnoses (116 cases) remained benign in 75 cases (64.7%), but were histologically confirmed as papillary carcinoma in 41 cases (35.3%). Category II diagnoses (702 cases) remained benign in 663 cases (94.5%), but were histologically confirmed as papillary carcinoma in 36 cases (5.1%), follicular carcinoma was present in two cases (0.3%), and poorly differentiated carcinoma occurred in one case (0.1%). Category III diagnoses (126 cases) remained benign in 39 cases (30.9%), but were histologically confirmed as papillary carcinoma in 84 cases (66.7%), follicular carcinoma in two cases (1.6%), and medullary carcinoma in one case (0.8%). Category IV diagnoses (4 cases) were histologically confirmed as follicular carcinoma in two cases (50%) and were histologically confirmed as benign in the other two cases (50%). Category V diagnoses (314 cases) were histologically confirmed as papillary carcinoma in 306 cases (97.4%), medullary carcinoma in three cases (1.0%), and poorly differentiated thyroid carcinoma in one case (0.3%), but were histologically confirmed as benign in three cases (1.1%). Finally, category VI diagnoses (285 cases) were histologically confirmed as papillary carcinoma in 282 cases (98.9%), but were histologically confirmed as benign in three cases (1.1%). The histologically confirmed as benign cases included 53 cases (6.7%) of adenomatous hyperplasia, 13 cases (1.7%) of lymphocytic thyroiditis, nine cases (1.1%) of follicular adenoma, one case (0.1%) of a hyalinizing trabecular tumor, and three cases (0.4%) of a fibrocalcific nodule. The total number and distributions of follow-up cases that were analyzed by each cytopathologist are shown in Table 3. We did not find any significant differences in the malignancy rates for categories II, V, and VI among four cytopathologists, but found considerable difference in the malignancy rates for categories I and III among the cytopathologists. However, the statistical significance of the differences could not be calculated because each FNA diagnosis was made independently by one of four cytopathologists.
Instead, the cut-off value of malignant diagnosis was analyzed using ROC curves to see if there is a difference in the malignancy-suggesting category among the cytopathologists, and consequently to see if there is a difference in the diagnoses among the cytopathologists. As shown in Fig. 1, we found that all cytopathologists used category III as the cut-off category for differentiating malignant and benign cases (p<.0001).
The sensitivity, specificity, false negative rate, false positive rate, positive predictive value, and negative predictive value of these malignancy rates for each cytopathologist are shown in Table 4.
DISCUSSION
We compared our findings with those reported in 11 previous studies2,3,4,5,6,7,8,9,10,11,12,13 and found several differences, which we attempt to explain below. The percentage of FNA diagnoses in the BSRTC categories V and VI in our study (19.3% and 17.3%, respectively) was higher than that in other studies (mean, 4.6% and 7.9%, respectively) (Table 5). In addition, the percentage of FNA diagnoses in category II in our study (40.6%) was lower than that in previous studies (mean, 62.7%) (Table 5). One possible reason for these differences may be that our hospital is a referral hospital for thyroid surgery, so many patients who are suspected of having thyroid cancer in other hospitals come here to have their FNA diagnoses confirmed and if possible undergo surgery. Baloch et al.10 also reported a high percentage of FNA diagnoses in BSRTC categories V and VI (19.1% and 21.3%, respectively) in a referral hospital, which is similar to our results. Similarly, Lee et al.13 found that 13.0% of FNA diagnoses in a referral hospital were classified as category VI.
We also found differences in the malignancy rates of some BSRTC categories in our hospital and those reported in previous studies. For example, the malignancy rates in categories I, III, IV, and V in our study (35.3%, 69.0%, 50.0%, and 98.7%, respectively) were higher than those reported in the original BSRTC guidelines (1-4%, 5-15%, 15-30%, and 60-75%, respectively).1 In addition, the malignancy rates in these categories in our study were higher than those in the other studies that we examined (mean, 24.2%, 33.9%, 37.2%, and 72.6%, respectively) (Table 6).2,3,4,5,6,7,8,9,10,11,12,13 There are two possible reasons for these differences. First, although the BSRTC guidelines recommend that patients with categories I or III diagnoses have a repeat FNA, in Korea, patients who have thyroid nodules that are strongly suspicious for malignancy in a clinical aspect undergo surgery without a repeat FNA, but a frozen section examination may be performed. Second, Korean patients tend to be more concerned about false positive results than false negative results, which may pressure cytopathologists to underdiagnose FNA cases to avoid making false positive diagnoses. Lee et al.13 also reported high malignancy rates for categories III and V (79.0% and 97.6%, respectively), which was similar to our result. However, their explanation was different from ours; they suspected that their cytopathologists did not properly apply the BSRTC classification criteria, which were still relatively new at the time of their study. If our explanation that Korean cytopathologists have a tendency to underdiagnose FNA cases is considerable, then clinicians may need to rely on intraoperative diagnoses in those cases. As a result, we need to refine and adapt to the BSRTC classification criteria or the malignancy rates of the BSRTC diagnostic categories may need to be modified. In addition, we recommend providing clinicians with current institutional data about malignancy rates in these categories to help them improve their management of thyroid nodule cases.
In this study, we did not find any significant differences in the malignancy rates for categories II, V, and VI among four cytopathologists, but found considerable difference in the malignancy rates for categories I and III among the cytopathologists. However, the large number of patients in these categories who did not undergo surgery may have biased these results. Several previous studies have noted that malignancy rates in the BSRTC categories can differ among cytopathologists both intra- and interinstitutionally. For example, Layfield et al.14 reported that there is wide variation in the malignancy rate for category III diagnoses between different institutions and among cytopathologists within same institution, depending on whether they had received cytopathology fellowship training. Similarly, Wu et al.12 found that differences in thyroid cytopathology diagnoses may arise due to differences in the amount of experience or training that cytopathologists have. Furthermore, Cibas and Ali,1 who wrote the original BSRTC paper, later admitted that "category III may never have good interobserver reproducibility, even after pathologists familiarize themselves with the criteria in the atlas,"15 but they argued that this category was still useful because it provides clinically important distinctions for some cases. Since category III diagnoses may differ significantly among cytopathologists, we recommend that institutions prepare annual reports about the malignancy rates of diagnoses in categories III-VI by individual cytopathologists, so clinicians can improve their case management of patients with thyroid nodules. We also recommend that clinicians consider refining category III in order to better characterize the threshold for differentiating malignant cases from benign ones.
Our comparison of malignancy rate was difficult because category I, II, and III usually does not require surgery, and distribution of category IV is too small to compare. Other reports also commented on the difficulty in comparing malignancy rates arising from the same cause.16,17,18
Our comparisons of the malignancy rates of individual cytopathologists may be limited in two ways. First, there are some methodological differences between our study and previous studies that have compared the FNA diagnoses of individual cytopathologists. Specifically, other studies made comparisons among cytopathologists who had all diagnosed the same slides.19,20,21,22,23,24 However, this method was not applicable to our study, because we performed a retrospective data analysis. In addition, such quality control methods are often not practical on a daily basis in hospitals. Second, there were no statistically significant differences in the cut-off value of malignant diagnosis by each cytopathologist. We found that all cytopathologists used category III as the cut-off category for differentiating malignant and benign cases (p<.0001). Therefore, we suggested that there was no difference in the malignancy rates of the diagnoses made by individual cytopathologists. However, an improvement in cut-off value is needed because category III is not appropriate for determining malignancy.
The sensitivity, specificity, false negative rate, false positive rate, positive predictive value, or negative predictive value of the malignancy rates of individual cytopathologists were not significantly different among cytopathologists. There is some limitation due to the large number of cases that were not treated with surgery.
Conclusion
In conclusion, the distribution of diagnostic categories in this institute is shifted towards category V and VI, and the malignancy rates of category I, III, and V were higher than those in other reports. Thus, our findings regarding the distribution of FNA diagnoses in the BSRTC diagnostic categories and their malignancy rates in our hospital suggest the need for future improvements in the BSRTC. Specifically, the determination of the malignancy rate needs to be modified to reduce additional diagnostic procedures, such as intraoperative diagnosis. In addition, we propose that institutions prepare, communicate, and use annual reports of malignancy rates of their cytopathologists' diagnoses to help clinicians practice better case management of patients with thyroid nodules.
Notes
No potential conflict of interest relevant to this article was reported.