Interobserver Agreement on Pathologic Features of Liver Biopsy Tissue in Patients with Nonalcoholic Fatty Liver Disease
Article information
Abstract
Background:
The histomorphologic criteria for the pathological features of liver tissue from patients with non-alcoholic fatty liver disease (NAFLD) remain subjective, causing confusion among pathologists and clinicians. In this report, we studied interobserver agreement of NAFLD pathologic features and analyzed causes of disagreement.
Methods:
Thirty-one cases of clinicopathologically diagnosed NAFLD from 10 hospitals were selected. One hematoxylin and eosin and one Masson’s trichrome-stained virtual slide from each case were blindly reviewed with regard to 12 histological parameters by 13 pathologists in a gastrointestinal study group of the Korean Society of Pathologists. After the first review, we analyzed the causes of disagreement and defined detailed morphological criteria. The glass slides from each case were reviewed a second time after a consensus meeting. The degree of interobserver agreement was determined by multi-rater kappa statistics.
Results:
Kappa values of the first review ranged from 0.0091–0.7618. Acidophilic bodies (k = 0.7618) and portal inflammation (k = 0.5914) showed high levels of agreement, whereas microgranuloma (k = 0.0984) and microvesicular fatty change (k = 0.0091) showed low levels of agreement. After the second review, the kappa values of the four major pathological features increased from 0.3830 to 0.5638 for steatosis grade, from 0.1398 to 0.2815 for lobular inflammation, from 0.1923 to 0.3362 for ballooning degeneration, and from 0.3303 to 0.4664 for fibrosis.
Conclusions:
More detailed histomorphological criteria must be defined for correct diagnosis and high interobserver agreement of NAFLD.
Nonalcoholic fatty liver disease (NAFLD) is a clinicopathological spectrum characterized by hepatic steatosis without history of significant alcohol use or other known liver disease. The commonly associated factors include obesity, insulin resistance syndrome, and hyperlipidemia. Other associations include jejuno-ileal bypass/gastroplasty surgery for morbid obesity, parenteral nutrition, forms of malnutrition, bacterial contamination of the small bowel, certain inherited metabolic disorders, and a wide range of drugs and environmental toxins [1]. The natural history and histological spectrum of NAFLD range from stable simple steatosis to progressive or advanced disease, such as steatohepatitis, cirrhosis, and even hepatocellular carcinoma [2-13]. A clinicopathological correlation is needed to diagnose NAFLD, as clinical assessment, laboratory tests, and imaging techniques alone provide limited pertinent information. Liver biopsy is the only reliable diagnostic tool to evaluate a patient with suspected NAFLD. Degree of steatosis, liver injury, and fibrosis associated with NAFLD can be identified and differentiated from simple steatosis and nonalcoholic steatohepatitis (NASH), which is the progressive form of NAFLD [14,15]. A NAFLD histopathological grading system was proposed by Brunt et al. in 1999 [16]. They proposed to classify each of steatosis, inflammation, and ballooning degeneration into three grades and fibrosis into four stages. Since then, Kleiner et al. [17] of the NASH Clinical Research Network have proposed a more relevant classification. Other than steatosis, inflammation, and ballooning degeneration, they proposed the classification of fibrosis around the central area into three more detailed grades. Also, they scored each finding and suggested that a score greater than five indicates NASH. However, until now, classifications for NAFLD or NASH have differed depending on the researcher, and no NAFLD histopathological grading system has been standardized, causing confusion for clinicians and pathologists. Furthermore, the histomorphological criteria for NAFLD pathologic features in liver tissue remain subjective. Thus, in this study, we determined the interobserver agreement of NAFLD pathologic features between pathologists and analyzed the causes of the disagreement in order to define the histopathological features in more detail as the basis for a grading system.
MATERIALS AND METHODS
Thirty-one patients with clinically and pathologically diagnosed NAFLD from 10 hospitals (Daegu Catholic University Medical Center, Dong-A University Hospital, Samsung Medical Center, Seoul National University Hospital, Inje University Seoul Paik Hospital, Seoul St. Mary’s Hospital, Soon Chun Hyang University Seoul Hospital, Wonju Severance Christian Hospital, Inha University Hospital, Chungnam National University Hospital) were selected. Selection criteria were clinical NAFLD (nonalcoholic, serologically negative for viral and autoimmune markers, abnormal levels of liver enzymes such as aspartate aminotransferase and alanine aminotransferase) and age ≥19 years. Cirrhosis cases were included, and cases of drug and toxic injury conditions were excluded. Fifty-one liver biopsies from 10 hospitals were collected. Among them, 31 biopsies (≥1.5 cm in length and ≥16 G needle size) were selected. One hematoxylin and eosin (H&E)- and one Masson’s trichrome–stained slide were selected from each of the 31 cases. The biopsy specimens were anonymized and randomized by a researcher not involved in the study. All selected slides were scanned by a virtual slide scanning system (3DHistotech Ltd., Budapest, Hungary) at Asan Medical Center in Seoul.
The following 12 NAFLD pathologic features were selected: steatosis grade, steatosis location, microvesicular steatosis, fibrosis stage, lobular inflammation, microgranuloma, large lipogranuloma, portal inflammation, ballooning degeneration, acidophilic bodies, Mallory’s hyaline, and glycogenated nuclei. Each parameter was reviewed and scored using the detailed scoring criteria shown in Table 1.
One H&E- and one Masson’s trichrome–stained virtual slide from each case were reviewed for the 12 parameters. Reviews were performed blindly by 13 pathologists from a gastrointestinal study group of the Korean Society of Pathologists. The degree of interobserver agreement for the first review was analyzed by multi-rater Kappa statistics.
The results were shared with all 13 pathologists, and a consensus meeting was held after the first review to analyze the reasons for disagreement and to define the morphologic criteria in more detail.
After the consensus meeting, a second review of the 12 pathological parameters was performed using glass slides from each of the 31 cases. The degree of interobserver agreement after the second review was analyzed by multi-rater Kappa statistics and compared with the results of the first review.
The Institutional Review Board of Seoul St. Mary’s Hospital approved this study (KIRB-00562_5-001).
RESULTS
Kappa values of interobserver agreement for the first review ranged from 0.0091 to 0.7618 (Table 2). The order of agreement, according to the kappa value, was acidophilic bodies (k=0.7618), portal inflammation (k =0.5914), large lipogranuloma (k=0.4822), Mallory’s hyaline (k=0.4603), steatosis grade (k=0.3830), steatosis location (k=0.3388), fibrosis (k=0.3303), glycogenated nuclei (0.3218), ballooning degeneration (k=0.1923), lobular inflammation (0.1398), microgranuloma (0.0984), and microvesicular fatty change (0.0091). The kappa values of the four major pathologic features (steatosis grade, portal inflammation, ballooning degeneration, and fibrosis) were measured as 0.3829, 0.5913, 0.1923, and 0.3303, respectively. In particular, ballooning degeneration (k=0.1923), which is an important feature for diagnosis of NASH, showed a low level of agreement.
Kappa values of interobserver agreement for the second review ranged from 0.1199 to 0.7386 (Table 2). The order of kappa values for interobserver agreement after the second review were portal inflammation (k=0.7386), acidophilic bodies (k=0.6493), steatosis grade (k=0.5638), Mallory’s hyaline (k=0.5236), large lipogranuloma (k=0.5004), fibrosis (k=0.4664), steatosis location (k=0.4502), glycogenated nuclei (k=0.3846), ballooning degeneration (k=0.3362), microvesicular fatty change (k=0.2916), lobular inflammation (k=0.2815), and microgranuloma (k=0.1199). The kappa values of interobserver agreement increased for all parameters except acidophilic bodies. Microvesicular steatosis demonstrated the largest improvement (k=0.0091 to 0.2916), and microgranuloma the smallest (k=0.0984 to 0.1199). All kappa values of the four major pathological features increased as follows: steatosis grade from k=0.3830 to 0.5638, portal inflammation from k=0.5914 to 0.7386, ballooning degeneration from k=0.1923 to 0.3362, and fibrosis from k=0.3303 to 0.4664.
DISCUSSION
Since the first description of NASH by Ludwig et al. in 1980 [18], several NAFLD scoring schemes have been proposed [16,17,19,20]. Among them, the NAFLD activity score (NAS) proposed by Kleiner et al. [17] is the most well-known and popular system. Their proposed NAS system is based on agreement data and a multiple regression analysis of the [14] histological features of steatosis grade, steatosis location, microvesicular steatosis, fibrosis, lobular inflammation, microgranuloma, large lipogranuloma, portal inflammation, ballooning degeneration, acidophilic bodies, pigmented macrophages, megamitochondria, Mallory’s hyaline, and glycogenated nuclei. The NAS is defined as the unweighted sum of the scores for steatosis (0–3), lobular inflammation (0–3), and ballooning (0–2) and ranges from 0 to 8. Fibrosis was not included as an NAS component. Kleiner et al. [17] reported that the interobserver agreement values for the four major features were 0.79 for steatosis grade, 0.45 for lobular inflammation, 0.56 for ballooning degeneration, and 0.84 for fibrosis. The agreement for other histologic features ranged from k=0.15 to 0.58. However, the histomorphological features of some parameters remain ambiguous, contributing to low interobserver agreement. We studied interobserver agreement among 13 pathologists for each of the 12 well-known parameters and analyzed the reasons for disagreement. At the first circulation of slides, we reviewed each case without consensus to identify current discrepancies in diagnostic criteria. The kappa values in this review ranged widely from 0.0091 to 0.7618. Kappa values of the four major pathological features at the first review were 0.3830 for steatosis grade, 0.1398 for lobular inflammation, 0.1923 for ballooning degeneration, and 0.3303 for fibrosis, lower than those of Kleiner et al. [17].
After the first review, we discussed several points of debate surrounding the definition of each parameter. As a result, we identified several details regarding steatosis grade, lobular inflammation, ballooning degeneration, fibrosis, Mallory’s body, and microvesicular fatty change and recommend the following:
(1) Steatosis grade: steatosis grade should be determined by fat volume rather than the number of fatty hepatocytes at 100×optical magnification (Fig. 1).
(2) Lobular inflammation: lobular inflammation should be graded under 200× magnification throughout the entire biopsy field, and the mean, not the maximum number in the most active field, should be determined. Spotty necrosis of hepatocytes, lymphocyte aggregations, and acidophilic bodes should be included, whereas lipogranuloma resulting from fat phagocytosis should be excluded as in the histomorphologic criteria for chronic hepatitis grading (Fig. 2).
(3) Ballooning degeneration: the histomorphological criteria of ballooning degeneration are enlarged round cells with loss of polygonal features and cytoplasm showing heterogeneous granular features (Fig. 3). Hydropic swelling and microvesicular fatty changes should be carefully distinguished from ballooning degeneration. In cases of hydropic swelling, the hepatocyte has large, swollen, and homogenously granular cytoplasm with well-preserved polygonal features. In particular, microvesicular fatty changes can also be enlarged and can be confused with ballooning degeneration. Microvesicular fatty changes show centrally located nuclei indented by a small fat droplet with a lipoblast-like feature (Fig. 4). When only one or two ballooned cells can be seen throughout the entire field, the term “few” can be applied.
(4) Fibrosis: mild fibrosis should be carefully distinguished from the normal framework around the central area. Only obvious fibrosis with pericellular collagen deposition should be considered as existence of fibrosis (Fig. 5).
(5) Mallory’s body: Mallory’s body should be defined as a definite eosinophilic lump in the cytoplasm (Fig. 3B).
(6) Microvesicular fatty changes: these are defined as lipoblast-like features showing centrally located nucleus and numerous intracytoplasmic micro-fat vacuoles inducing nuclear indentation. Microvesicular fatty changes should be carefully differentiated from cells having small- or medium-sized fat vacuoles without cytoplasmic enlargement and nuclear indentation, as mentioned by Yeh and Brunt (Fig. 6) [21].
All kappa values increased in the second review based on the above criteria. In particular, kappa values for the four major parameters increased from 0.3830 to 0.5638 for steatosis grade, from 0.1398 to 0.2815 for lobular inflammation, from 0.1923 to 0.3362 for ballooning degeneration, and from 0.3303 to 0.4664 for fibrosis.
This increased agreement likely resulted from the consensus meeting and the determination of more detailed histomorphologic criteria. However, the method used in the second review is more familiar to most pathologists and may have also contributed to increased agreement. Despite the differences in review method (virtual vs glass), there is no doubt that the exact histomorphologic criteria of NAFLD remain ambiguous and contribute to low interobserver agreement between pathologists.
Therefore, our more detailed suggestions for NAFLD histomorphologic criteria—including steatosis grade, lobular inflammation, ballooning degeneration, and fibrosis, as mentioned above—will increase the accuracy of diagnosis and grading of NAFLD and improve interobserver agreement. Through this work and recommendations, we expect that a more exact basis for research of NAFLD and development of a new grading and scoring system will follow.
Notes
Conflicts of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This study was supported by the Academic Research Fund 2012 from the Korean Society of Pathologists. We appreciate all members of the Gastrointestinal Pathology Study Group of the Korean Society of Pathologists, particularly Kyung Bun Lee for statistical analysis and Eunsil Yu for scanning the virtual slides.