Hazard ranking of wastewater sources in a highly polluted river in northern Taiwan by using positive matrix factorization with metal elements

Improving water quality is a critical issue worldwide. However, the general parameters (i.e., temperature, pH, turbidity, total solids, fecal coliform, dissolved oxygen, biochemical oxygen demand, phosphates, and nitrates) used in water quality index estimations are unable to identify pollution from industrial wastewater. This study investigated pollution sources at a river pollution hotspot by using the positive matrix factorization (PMF) model. A two-phase sampling collection along a highly polluted river in northern Taiwan was designed. The sampling spots were distributed along the river in Phase I to monitor the spatial variation of river pollutants. A pollution hotspot was determined based on two indices, namely the summed concentrations of metal elements and a metal index (MI). In Phase II, the river water samples were collected from the hotspot twice daily over 30 consecutive days to monitor the temporal variation of river pollutants. Source profiles of metal elements were obtained during the monitoring period. The Phase II samples were then factorized using the PMF model. Factor profiles retrieved from the PMF model were further assigned to industrial categories through Pearson correlation coefficients and hierarchical classification. The results indicated that the main pollution source was bare printed circuit boards (BPCB), which contributed up to 92% of the copper in the pollution hotspot. In terms of MI apportionment of 11 metals related to health effects, BPCB contributed 91% of the MI in high pollution events. Overall, the MI apportionment provides linkages between pollution level and human health. This is an evidence for policymakers that the regulation of the effluents of BPCB is an effective means to controlling copper concentrations and thus improving water quality in the study area.


Introduction
On September 25, 2015, the 193 member states of the United Nations adopted the 2030 Agenda for its 17 Sustainable Development Goals (SDGs) [1]. The agenda was expected to guide the actions of international communities over the next 15 years (2016-2030). One of the 17 SDGs involves improving water quality by reducing river water pollution. The causes of river water pollution include industrial activity, domestic sewage, livestock wastewater, and agricultural wastewater.
The water quality index (WQI) is commonly used to evaluate general water quality [2][3][4]. Multiple parameters are involved in the estimation of WQI scores, and weighted coefficients are assigned to each parameter based on statistical surveys. The general parameters are

Open Access
Sustainable Environment Research *Correspondence: changfu@ntu.edu.tw temperature, pH, turbidity, total solids, fecal coliform, dissolved oxygen, biochemical oxygen demand, phosphates, and nitrates [5,6]. However, an index using these parameters generally cannot be used to measure pollution from industrial wastewater.
River water influenced by discharge from an industrial district usually has a complex composition including metal elements, water-soluble ions, and volatile organic compounds. Stable and non-degradable metal elements could act as suitable tracers for monitoring pollution discharged by industries [7,8]. A metal index (MI) based on maximum allowable concentration (MAC) has been introduced to evaluate the quality of water resources [9,10]. Usually, the MAC of an element is set by regulatory agencies to protect human health and ecological environment, which makes the MI a valuable indicator.
To decrease the health hazards of polluted water and improve water quality, identifying pollution sources is critical. Various statistical techniques have been applied for source apportionment. Principle component analysis (PCA) has been widely used in the source apportionment of water pollution [11][12][13][14][15]. PCA yields linear combinations of original features and can transform large datasets into a set of factors with reduced dimensions. However, PCA can only provide qualitative information, and it is susceptible to outliers. Furthermore, the results of PCA does not have physical meaning because the values of factors decomposed using PCA can be negative. Positive matrix factorization (PMF) is recognized as an effective scientific tool and as an improved factor analysis tool for source apportionment [16]. Many studies have successfully applied PMF to the source apportionment of ambient particulate matter [17,18]. In recent years, PMF has also been used to investigate pollution sources in aqueous environments (e.g., [19][20][21][22][23]). PMF has the following advantages: (1) it includes the integration of non-negative constraints; (2) uncertainties in the data are introduced into the model; (3) it does not need to be orthogonal, which makes the formed factors close to real profile of pollution sources; (4) it quantifies source contributions; and (5) the factors are not excessively influenced by outliers [24][25][26]. However, in most of these studies, source apportionment was estimated using spatial samplings, which neglected temporal variation in the source contributions. Furthermore, the number of metal species was limited and may be insufficient to distinguish complex industrial sources.
The objective of this study was to use metal species with a receptor model to estimate potential source profiles and their contributions in a river pollution hotspot. The Ta-Liao-Keng River was selected as an example to demonstrate the feasibility of the approach. To the best of our knowledge, this is the first study adopting sequential time series data of river water samples for PMF modeling with factor profiles identified by comparison with profiles obtained from industrial effluents. Potential sources of MI and their contributions were apportioned to clarify the relationship between pollution levels and human health.

Study area and river water sample collection
The Ta-Liao-Keng River is a tributary of the Dahan River, which is a major river in Taiwan (Fig. 1). The length of the Ta-Liao-Keng River is 12.3 km, and its catchment area is approximately 29.4 km 2 . The river has four branches and flows eastward from Taoyuan City to New Taipei City (A fishbone diagram is provided in Fig. S1 of Supplementary Materials). The main industrial area is located at a downstream section of the river and is especially concentrated on Tandigou, a tributary (S4 in Fig. 1). According to a report published by the local government in 2017, the Ta-Liao-Keng River was the main contributor of copper (Cu) to the Dahan River. Thus, further investigation of the source pollution, especially for Cu, in the Ta-Liao-Keng River is imperative.
For this study, a sampling scheme with two phases was designed. In the first phase (Phase I, end of April 2019), four sampling sites along the Ta-Liao-Keng River were selected to locate a hotspot for further investigation. The river water quality was monitored from downstream to upstream. Four additional sampling sites at four branches served as background sites. A total of eight river water samples were collected during this phase. During the sampling period, mild weather (with at most light rainfall) minimized the potential contributions of suspended solids from sediments to metal elements in the river.
Two pollution indices, the sum of elemental metal concentrations and MI, were applied to investigate the pollution hotspot. The MI was defined as where C i is the concentration of one of 11 controlled elements, i.e., silver (Ag), arsenic (As), cadmium (Cd), Cu, hexavalent chromium (Cr 6+ ), mercury (Hg), manganese (Mn), nickel (Ni), lead (Pb), selenium (Se), and zinc (Zn). Table 1 presents the MAC values of these 11 elements, which are enforced by Taiwan Environmental Protection Administration (TEPA), for surface water. Because trivalent chromium and Cr 6+ can convert back and forth, a measurement of total chromium can avoid missing one of these. Instead of using Cr 6+ , total chromium was measured and used to estimate the MI. Pollution is a cause for concern when the MI is greater than one. In the second phase (Phase II, July 2019), samples were collected twice daily over 30 consecutive days (N = 60) at the hotspot identified in Phase I. The collected samples were used to evaluate temporal variation in river water pollution and for PMF modeling.

Sampling of industrial wastewater
To investigate potential pollution sources and assign them to industrial categories, representative source profiles were needed. A total of 20 wastewater samples from 9 effluent discharges were collected as source profiles. Furthermore, because the river could be contaminated by untreated wastewaters which were discharged by factories unintentionally, three manufacturing units were selected for sampling based on previous violations. Most industrial wastewater samples were collected twice during the sampling time to reduce uncertainty in the source profiles (Table S1). Overall, samples were collected for six industrial categories: chemical materials and products (CMP), bare printed circuit boards (BPCB), electroplating products (EP), food manufacturing (FM), finishing of textiles (FT), and metal surface treatment (MST). To further investigate the association between human activities and the river pollution, domestic sewage (DS) was collected as a potential source. The wastewater samples were collected and preserved in accordance with the Standard Method [27].

Chemical analysis
To quantify metal elements and trace elements in river water and wastewater samples, inductively coupled plasma-mass spectrometry was used according to the Standard Method [28]. A total of 52 elements were analyzed and the method detection limit (MDL) of these elements ranged from 0.03 to 1.67 μg L − 1 (Table S2).

Source apportionment
The PMF 5.0 software released by U.S. Environmental Protection Agency was used for source apportionment in the second phase [29]. In PMF modeling, the mass balance equation is expressed as where X ij is the jth species concentration measured in the ith sample, g ik is the contribution of the kth source pollution in the ith sample, f kj is the source profile defined by the jth species concentration in the contribution of the kth source pollution, and e ij is the residual. The uncertainty of measurements in PMF is calculated using when the measured concentration is higher than the MDL [29]. When the measured concentration is less than the MDL, the concentration is replaced with half the MDL, and the uncertainty is estimated as follows: In PMF, the objective function is defined using where n is the number of samples and m is the number of species. The optimal values of contributions (g ik ) and sources profiles (f kj ) can then be retrieved by minimizing Q(E).
To determine the number of factors, PMF was run sequentially using a varied number of factors from 3 to 8. The scaled residual (r ij ) of each run is used to calculate the maximum individual column mean (IM), The appropriate number of factors is that for which the derivatives of IM and IS seem to level off [30].
In PMF modeling, bootstrapping is performed to evaluate the stability of the solution and the uncertainties of the source contribution estimates. The reproducibility of bootstrapping on each run (three to eight factors) is used to determine the number of factors, as well. In this study, 100 was chosen for the number of bootstrap runs. The steps taken in this research is shown in Fig. S2.

Spatial and temporal variations in river water measurements
The sum of elemental concentrations was low from the upstream reaches and gradually increased along the river in Phase I (Table S3, Sites S8, S5, S3, and S1). The two species with the highest concentrations were aluminum (Al) and iron (Fe); the sum of Al and Fe at S6 was 86% of the elemental concentrations. Al and Fe are the most abundant metals in the Earth's crust. To focus on industrial pollutants, Al and Fe were eliminated from the following estimations. The highest elemental concentration was observed at Tandigou (S4), a tributary of the Ta-Liao-Keng River. The concentration at S4 was almost six times greater than that of the background water (S8). The second highest value was observed in the other tributary, S2. The high concentrations in tributaries instead of in the main stream was probably due to inputs from domestic wastewater and industrial effluents in the downstream area. The MI at S4 was up to 42.8, 17 times higher than that at S8, which indicated high levels of pollution at S4. Based on the spatial analysis of the metal elements in Phase I, S4 was chosen as the pollution hotspot. In Phase II, the pollution hotspot was monitored twice daily over 30 consecutive days beginning on July 1, 2019. Summary statistics for the measured elemental concentrations are provided in Table S2. The three elements with the highest mean concentrations (exclude Al and Fe) were Cu (1246 μg L − 1 ), Zn (558 μg L − 1 ), and Ni (313 μg L − 1 ). When applying the PMF model, data pretreatment is crucial. Species must be removed if more than 70% of the data points are below the detection limit. In this study, the following 13 species were excluded: dysprosium, erbium, europium, hafnium, holmium, iridium, platinum, ruthenium, samarium, tellurium, thulium, ytterbium, and zirconium. Most of the excluded species are rare earth elements and precious metals. Furthermore, data quality is evaluated by signal-to-noise ratios in PMF modeling and inadequate species can be weighted down or eliminated during the optimization. We eliminated six elements with poor fit, namely gold, boron, cobalt, molybdenum, antimony, and tantalum. Finally, 31 of the 50 elements were included in the PMF model. The highest total elemental concentration at S4 was observed on July 1 when the concentration was up to 13,465 μg L − 1 (Fig. 2). The second highest total elemental concentration, 9100 μg L − 1 , was observed on July 15. In terms of the MI, the largest value, 258, was also observed on July 1 (Fig. 3). The second largest MI value, 170, was observed on July 4. A discrepancy was discovered when defining the pollution events using two indices. This is discussed in the following sections.

Characteristics of the metal elements in industrial wastewater
Most industrial wastewater was collected twice, and the samples for each factory were averaged to create a composite profile. The composite profiles of eight industrial and three manufacturing unit (with "m" in front of the category) discharges are illustrated in Fig. 4. CMP was represented by a battery factory, which emitted a high concentration of Pb. The profiles for BPCB were dominated by Cu, which comprised up to 82% of total elemental concentrations. The profiles of the EP were dominated by Ni, with proportions of up to 88%. FM was represented by a factory making frozen food, where discharge had high concentrations of strontium (Sr) and rubidium. The profile of FT was dominated by Sr and Mn. The MST was represented by a manufacturer of aluminum-related products, and Mn and Ni were dominant in its profile. The profile of DS was collected at an apartment complex   and was dominated by Mn and Sr. All compositions followed the effluent standards established by the TEPA.

Source apportionment
IM and IS analyses indicated that possible solutions were around six factors (Fig. S3). To identify the optimal solution, bootstrap analysis was introduced (Table S4). This revealed that five factors provided the most stable solution with the mapping rate > 80% for each factor. The averaged source contributions of total elemental concentrations (the sum of 31 elements) and of Cu are depicted in Fig. 5. Factor 4 was the greatest contributor (36 ± 5%) to total elemental concentrations, followed by Factor 1 (19 ± 8%) and Factor 5 (16 ± 4%). For Cu, Factors 4 and 1 offered even larger contributions (64 ± 23% and 28 ± 18%, respectively). To identify the industrial categories of the three major contributors, Pearson correlations and hierarchical classifications were used. Table 2 lists the correlations of Factors 1, 4, 5, with all source profiles collected in this study. Factors 1 and 4 had strong correlations (r ≥ 0.96) with BPCB1, BPCB2, and their manufacturing unit discharges. The correlation between Factor 1 and Factor 4 was as high as 0.99. This indicated that Factors 1 and 4 were in the same industrial category. Factor 5 was highly correlated with DS (r = 0.95). Figure 6 depicts a cluster dendrogram of the hierarchical classification of the factors and source profiles. Factor 1 was similar to BPCB1 and its manufacturing unit discharge, and Factor 4 was similar to BPCB2 and its manufacturing unit discharge. Among all source profiles and factors, BPCB1 and BPCB2 were prominent in the hierarchical clustering. Moreover, Factor 5 was closely related to DS. BPCB1 and BPCB2 are Cu-dominated, and the profile of DS is dominated by Mn and Sr. Overall, the profile matching based on the two methods was consistent, and the markers for the estimated factors corresponded to those in the measured profiles. Factors 1 and 4 were classified as BPCB1 and BPCB2, respectively, and Factor 5 was classified as DS. The measured total elemental concentrations and the contributions of Factors 1, 4, and 5 over time are presented in Fig. 2. The contribution of Factor 4 closely matched certain high pollution events. On July 4, Factor 4 contributed 86% of total elemental concentrations and decreased in the afternoon. Factor 1 was the main contributor (76%) of the highest pollution event on July 1 even though it was not a persistent pollution source. By contrast, Factor 5 was a stable, persistent source of pollution, but its contribution was relatively small. Concentrations of Cu were greater than 3000 μg L − 1 in many high pollution events (Fig. S4), and the ratio of Cu to total elemental concentration was > 50%. Factor 1 was the significant contributor of Cu on July 1. Factor 4 was the main source of Cu in high pollution events, except for the highest one. For Factor 1, Cu comprised 65% of the normalized profile (Fig. S5). Cu was also dominant in Factor 4 and represented up to 78% of total elemental concentrations. Instead of Cu, Mn and Sr were significant in Factor 5.

MI apportionment
These analyses in the previous section have focused on source apportionments for mass concentrations. From the perspective of protecting human and ecological health, further quantifying the contributions of sources due to MI, which considered both the concentrations and MAC of the main toxic elements, is needed. The source-specific MI of the pth source (MI p ) was calculated using where x * jp is the mean concentration of species j in source p, based on the model results, and MI j is the unit MI for species j.
The averaged MI was 53.9, with Cu contributing up to 77%. The other highly polluting species was Mn (10%), followed by Ni (6%). Figure 5c presents the MI apportionment results. Factor 4, which was classified as BPCB2, was the main MI contributor (53%) at S4. The second highest contributor (24%) was Factor 1, which was classified as BPCB1. The sum of MI contributions from BPCB1 and BPCB2 was up to 77%, indicating that BPCB was the industrial category with the greatest health risk at S4. Figure 3 presents the time series of the measured MI scores and those of the two largest contributors. To compare the source contribution during pollution events and nonevents, high pollution events (HPEs) were defined as those with MI > 99. 3 (Fig. 7). During GCs, BPCB1 and BPCB2 were also the largest health risk contributors, and the sum of their contributions was 79%. DS contributed the second greatest health risk, with a 15% contribution. Although DS was not a dominant source pollution in HPEs, it was a stable source pollution that contributed 9% of the averaged MI.

Limitations
In this study, the MI was considerably higher than the reference value because of stringent MAC of Cu in Taiwan. The guideline for drinking water quality published by the World Health Organization [31] is 2000 μg L − 1 for Cu, but the MAC of Cu is 30 μg L − 1 in Taiwan. This makes estimates of the MI higher if Cu is the main pollutant. Nevertheless, establishing stringent a standard for Cu can protect aquatic biota [32,33] and also human beings who are at the top of the food chain. Another limitation is that the result estimated by the PMF model provided only information on source contributions by industrial categories and not the exact source locations. However, this modeling approach is valuable in narrowing down pollution sources to specific industrial categories.

Conclusions
This study investigated the source contributions at a river pollution hotspot by using PMF. This was the first study to apply sequential time series data of metal species in PMF to apportion sources and their contributions by industrial category. The results revealed that the main source was BPCB, and the sum of BPCB1 and BPCB2 contributions was up to 55% of the total elemental concentrations and 92% of copper in the study area. Because Cu element was the main pollutant in BPCB which was the main contributor, this leads to severe copper pollution in Tandigou. BPCB contributed 75% of total elemental concentrations in MI. For both elemental concentrations and the MI, which is related to human and ecological health, BPCB was the main contributor. In further investigation of the source pollution in HPEs, the MI contribution from BPCB was up to 91%. The results indicated that BPCB contributed toxic pollutants and that led to a high percentage of MI contribution. Although the waste water samples showed that the discharge for each factory monitored in this study followed the effluent standard, the overall influence on the Cu concentration in river water might still be enormous. Implementing comprehensive control strategies, such as tightening the effluent regulations, centralizing the water treatment system, and enhancing monitoring are warranted, especially for the BPCP industry.
In this study, we demonstrated the feasibility of using metal species with a receptor model to identify source pollution. The model results might provide useful information to policymakers for the effective management of river environments. For further investigation of river pollution, establishing a comprehensive source profile database is highly recommended for source identification.