Accuracy of cervical cytology: comparison of diagnoses of 100 Pap smears read by four pathologists at three hospitals in Norway

Background Cervical cancer can be prevented by early detection and treatment for precancerous lesions. Since 1995, there has been a national cervical cancer screening program in Norway, where women aged 25–69 years are recommended to take Pap smears every three years. There are 17 cytology laboratories covering a population of 5 million people. The detection rate of cervical abnormalities varies from laboratory to laboratory. We wanted to investigate the accuracy of cytology diagnoses by four different pathologists at three different hospitals in Norway. Methods One hundred Pap smears (20 Normal, 20 ASC-US, 20 LSIL, 20 ASC-H and 20 HSIL) screened at UNN in 2015 were evaluated by four pathologists at three hospitals in Norway. All patients were followed up through December 2016. Histologically confirmed high-grade dysplasia (CIN2+) was considered as study endpoint. Results The number of Pap smears evaluated as abnormal (ASC-US+) by the four pathologists varied from 61 to 85. The number of high-grade cytology (ASC-H+) varied from 26 to 50. There was moderate agreement (weighted kappa 0.45–0.58) between the observers. There were 32 women with high-grade histology (CIN2+) in the follow-up, including 19 CIN2, 12 CIN3 and one squamous cell carcinoma (SCC). Using high-grade cytology (ASC-H+) as cut-off, the sensitivity for CIN2+ varied from 68.8% to 93.8% (mean 77.4%) and specificity from 70.6% to 95.6% (mean 81.3%). The pathologist with the highest sensitivity for CIN2+ had the highest false positive rate and the lowest specificity (p<0.05). The accuracy for CIN2+ varied from 74.1% to 83.8% (mean 79.4%). The Pap smear from the woman with cervical cancer was diagnosed as high-grade (ASC-H+) by one of the four pathologists. Conclusions Cervical cancer screening based on cytology has limited accuracy. The study revealed a moderate agreement between the observers, along with a trade-off between sensitivity and specificity. This might indicate that hospitals with high detection rates of cervical cytology have higher sensitivity for CIN2+ but lower specificity. Electronic supplementary material The online version of this article (10.1186/s12907-017-0058-8) contains supplementary material, which is available to authorized users.


Background
Cervical cancer is caused by human papillomavirus (HPV) and develops over many years through a series of precancerous steps [1,2]. The disease can be prevented by using the HPV vaccine or by screening with HPV test or Pap smears [3,4]. Since 2009, there has been a HPV vaccination program for 12-year-old girls in Norway.
The program's coverage is around 80% [5]. Since November 2016, there has been an ongoing two-year catch-up vaccination program for 20-25 years old women where the expected coverage rate is 40-45% (www.fhi.no). Since 2015, there has been a pilot for HPV testing in primary screening in four counties [6]. In this pilot, women 34 years and older are randomized to Pap smear every three years or HPV test every five years [6]. However, in most parts of Norway, the cervical screening program is still based on cervical cytology [5].
Since 1995, there has been a national cervical cancer screening program in Norway, where women aged 25-69 years are recommended to take Pap smears every three years [5]. Women with high-grade cytology (ASC-H / HSIL) are referred to a gynecologist for colposcopy and biopsy. HPV test is used in triage of women with lowgrade cytology (ASC-US / LSIL). The cervical screening program has a coverage of 60% after 3.5 years. The Norwegian Cancer Registry sends a reminder to women without a Pap smear after three years and a new reminder after four years. The coverage is 80% after 5 years [5]. Most Pap smears are taken by GPs, while some samples are taken by gynecologists. There are 17 different laboratories involved in the screening program, and most of these use liquid-based cytology (ThinPrep or SurePath).
It is well known that cervical cytology has limited sensitivity and reproducibility [7][8][9][10][11][12]. Diagnoses may vary from cytotechnician to cytotechnician, from pathologist to pathologist and from lab to lab [9,11,12]. All cervical cytology diagnoses, results of HPV tests and biopsies from all laboratories in Norway are reported to the Norwegian Cancer Registry, which drafts annual reports with feedback to each laboratory, including the distribution of their diagnoses compared with the national average [5] (Table 1).
There is a high variability in detection rates across hospitals. This may be due to higher sensitivity, lower specificity, differences in HPV prevalence, cervical dysplasia and cancer in some parts of the country compared to other parts of the country, or a combination of these causes. We wanted to investigate the accuracy of cytology diagnoses by four different pathologists at three different hospitals in Norway.

Methods
One hundred cervical cytological samples screened at UNN in 2015 with the diagnoses normal, ASC-US, LSIL, ASC-H and HSIL were sent to the Departments of Pathology in Bergen (HUS), Bodø (Nordland), Fredrikstad (Østfold), Stavanger (SUS) and Tønsberg (Vestfold). The pathologist at the Department of Pathology in Bergen did not have time to participate in the study, and he forwarded the slides to Stavanger without looking at them. Two cytotechnologist at the Department of Pathology in Fredrikstad diagnosed the slides, but they were trained to screen SurePath samples. Their results were therefore excluded from this study based on ThinPrep samples.
All slides were first screened by a cytotechnologist at UNN and then evaluated by a pathologist at UNN (P1, reference). The abnormal cells were marked on the slides before being dispatched for the study. The slides were not screened at the other hospitals. The four other pathologists (P2-P5) at other hospitals were to only evaluate the abnormal cells marked on the slides. The other pathologists were blinded for age, previous findings, clinical information and HPV result. Diagnoses from each of the four pathologists were compared with diagnoses from the three other pathologists. Women with abnormal findings at UNN were followed up according to national guidelines. In Norway, the Bethesda System for Reporting Cervical Cytology is used by all laboratories. All patients were followed up through December 2016. Histologically confirmed high-grade dysplasia (CIN2+) was considered as study endpoint (gold standard). When calculating the sensitivity and specificity, women with normal Pap smears, and women with low-grade cytology (ASC-US / LSIL) and negative HPV test without histology, were considered free of high-grade dysplasia (CIN1-).
All analyses were done in IBM SPSS Statistics, version 23, with Chi-square test for categorical variables and t-test for continuous variables. For accuracy of cytological diagnoses between different observers, we used weighted kappa with linear weights.
The number of samples diagnosed as "Normal" varied from 15 to 39 by the four pathologists, with a mean of 28.8. One pathologist (P2) had significantly fewer "Normal" cases than the average of the four pathologists (p<0.05) ( Table 3). The corresponding variation of ASC-US, LSIL, ASC-H and HSIL were 17 to 24 (mean 19.8), 9 to 20 (mean 14.0), 10 to 18 (mean 13.3) and 16 to 32 (mean 24.0), respectively ( Table 3), none of which were significant. There was moderate agreement between the observers (weighted kappa 0.45-0.58) ( Table 4). The kappa statistics were not statistically different.
The agreement of the different diagnoses was higher for "Normal" and "HSIL" samples than the other diagnoses (ASC-US, LSIL and ASC-H) (Additional file 1: Tables S1-S5). The number for high-grade cytology (ASC-H+) varied from 26 (P4) to 50 (P2). Of 61 women with at least one high-grade cytology, 17 samples (27.9%) were considered high-grade by all four observers (Additional file 1: Figure S1). The number of true positive (CIN2+) using ASC-H+ as a cut-off varied from 22 to 30 (mean 24.8) (Additional file 1: Figure S2 and Table 5).
The corresponding sensitivity for CIN2+ varied from 68.8% to 93.8% (mean 77.4%). One pathologist (P2) had significantly higher sensitivity than the average of the four pathologists (p<0.05) ( Table 5). Of 32 women with CIN2+, 15 samples (46.9%) were considered high-grade by all four observers (Additional file 1: Figure S2). One woman with CIN2 was not considered to have high-grade cytology by any of the four observers (patient 57, Additional file 1: Table S3). The number of true negative (CIN1-) using LSIL-as a cut-off varied from 48 to 65 (mean 55.3). The corresponding specificity ranged from 70.6% to 95.6% (mean 81.3%) ( Table 5). One pathologist (P2) had significantly lower specificity and one pathologist (P4) had significantly higher specificity than the average of the four pathologists (p<0.05) (see Table 5). The pathologist (P2) with the highest sensitivity for CIN2+ had the highest false positive rate and the lowest specificity ( Table 5). The accuracy for CIN2+ varied from 74.1% to 83.8% (mean 79.4%). There were no statistically significant differences in accuracy ( Table 5). The Pap smear from the woman with cervical cancer (SCC) was diagnosed as highgrade (ASC-H+) by one of the four pathologists (P2), while three pathologists diagnosed her as ASC-US (Additional file 1: Table S5). The woman had a positive HPV test for HPV type 16 (data not shown).

Discussion
The study's purpose was to investigate the accuracy of cytology diagnoses by four different pathologists at three    hospitals using 100 Pap smears with different cytological diagnoses screened at UNN. The agreement of the cytological diagnoses between the four pathologists in this study was "moderate." A moderate agreement is better than "fair," but worse than "substantial." The kappa statistics were not statistically different.
In Norway there are 17 cytology laboratories covering a population of 5 million people [5]. All the laboratories receive most of their samples from general practitioners in primary screening. The population in Norway is quite homogenous, where Norwegian women in the different parts of Norway are mostly the same. The differences between the various laboratories are probably caused by different interpretation of the Bethesda criteria. Two pathologists (P4 and P5) were from the same laboratory but still used very different diagnoses for the same patients.
In the ATHENA study, the sensitivity of cytology varied from 42.0% to 73.0% [12]. In our study, the sensitivity for CIN2+ varied from 68.8% to 93.8%, but all the smears were first screened at the same hospital, and abnormal cells were marked on the slide. It is easy to find abnormal cells on a slide full of marks. In a population with a given prevalence of CIN2+, the sensitivity of cytology is dependent on the detection rate. In the ATHENA study, the positivity rate of cytology in primary screening varied from 3.8% to 9.9% while the detection rate of HPV DNA test (Cobas 4800) varied from 10.9% to 13.4% [12]. In our study, the detection rate of high-grade cytology (ASC-H / HSIL) varied from 26.0% to 50.0%, while the detection rate of HPV DNA test (Cobas 4800) was 74.3% (52/70).
In our study, the accuracy varied from 74.1% to 83.8% (mean 79.4%). In five published studies the accuracy varied from 64.2% to 78.4% (mean 76.1%) ( Table 6). There was less variation between the four pathologists in our study than between the five published studies. The mean accuracy of the four pathologists in our study was significantly higher than the mean of the five published studies (79.4% vs 76.1%, p<0.05).
There is a trade-off between sensitivity and specificity in cervical cancer screening. In our study the pathologist with the significantly highest sensitivity for CIN2+ had the significantly lowest specificity. In general, laboratories with a high detection rate of cytology also have higher sensitivity for CIN2+. If the sensitivity is higher, the hospital detects more women with CIN2/3 that can be treated, and fewer women develop cervical cancer before the next screening round. When women with lowgrade cytology (ASC-US / LSIL) are triaged with HPV test, a high detection rate of low-grade cytology should not be considered as a major problem. A false positive ASC-US will have a negative HPV test and does not need follow-up. A false negative "Normal" cytology has no indication for HPV testing, according to Norwegian guidelines (www.kreftregisteret.no).    Significantly higher than the average (p<0.05) 2 Significantly lower than the average (p<0.05)