Assessment of surface water quality and monitoring in southern Vietnam using multicriteria statistical approaches

The study was conducted to analyze surface water quality fluctuations in the the southern region of Vietnam using monitoring data at 58 locations. Seventeen water quality indicators were analyzed including pH, temperature, electrical conductivity (EC), dissolved oxygen (DO), total dissolved solids (TDS), total suspended solids (TSS), chemical oxygen demand (COD), biological oxygen demand (BOD), ammoniacal nitrogen (NH4+- N), nitrite nitrogen (NO2−-N), nitrate nitrogen, iron (Fe), lead (Pb), cadmium (Cd), mercury (Hg), arsenic (As), and chloride (Cl− ions). The study used water quality index (WQI), cluster analysis (CA), principal component analysis (PCA), and Entropy weighted methods to analyze surface water quality. The results showed that the surface water was contaminated with organics (low DO and high TSS, BOD, COD), nutrients (high NH4+-N) and Fe. While Pb at some locations exceeded the allowable limit of the national technical regulation, other indicators (Cd, Hg and As) were within this standard. Moreover, DO, TSS, BOD, COD, NH4+-N, Fe, EC, TDS, and Cl− were seasonally fluctuated. WQI classified water quality from bad to very good (WQI = 42–100) due to the impact of hydrological conditions, navigation, wastewater from industrial zones, and fishing ports. The results of CA suggested that it is possible to reduce the 11 sampling locations (from 33 to 22 locations) in clusters 1–6 and the frequency of monitoring from 8 to 5 times per year, which still ensures representativeness of water quality over time. This reduction allows lowering the monitoring costs by 56.5%. The PCA identified five major potential sources explaining 87.3% and 8 minor sources explaining only 12.7% of water quality variation. Temperature, pH, EC, DO, BOD, COD, NH4+-N, NO2−-N, Fe, Cl−, Pb are important indicators need to periodically be monitored. Furthermore, orthophosphate, total phosphate, total nitrogen, coliforms should be considered to add into the next monitoring program. Locations with medium and bad water quality are concentrated in Dong Nai, Ho Chi Minh City and Long An, which requires finding effective solutions to improve the water quality. The current results can assist in decision-making related to environmental quality monitoring in the southern region of Vietnam.


Introduction
Water plays an important role for organisms and humans. Therefore, regular water quality monitoring is considered a top priority for all countries in the world [1,2]. Monitoring water quality not only helps countries assess and predict pollution but also provides information for planning the sustainable use of water resources [3][4][5]. Vietnam conducts annual environmental monitoring of surface water, underground water, and seawater in environmental management services. Monitoring the water environment is assigned to the Ministry of Natural Resources and Environment and the People's Committees of 63 provinces and cities. Water environment monitoring is also carried out by production and business establishments in accordance with the current environmental protection law [6]. Physical, chemical, and biological indicators are used in environmental monitoring in water bodies in Vietnam. The physical and chemical parameters include temperature, pH, total suspended solids (TSS), turbidity, dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH 4 + -N), orthophosphate, heavy metals (iron (Fe), aluminum, manganese, crom, cadmium (Cd), etc.), chloride (Cl − ions), sulfate, pesticides, antibiotics, and biological factors (Escherichia coli, coliform) [7]. Monitoring results are evaluated using national technical regulations on surface water quality (QCVN 08-MT:2015/BTNMT) or water quality index (WQI) [7][8][9]. These assessment methods are simple, using only one or a few criteria included in water quality assessment, thus not fully exploiting the important information hidden in the very large dataset [5,[9][10][11].
Currently, multivariate statistical methods are widely used in water quality assessment. Multivariate statistical methods can include all the water quality information in the calculation simultaneously, which can extract significant inferences from the dataset [5,[11][12][13][14]. It is incorporated with cluster analysis (CA), principal component analysis (PCA) methods to assess the quality of rivers, lakes, and groundwater water [10,13,[15][16][17][18][19]. The CA method is based on the similarity of water quality in space and time and can therefore assess sampling locations and sampling frequencies [5,[9][10][11]. In addition, entropy weight was also used to measure the importance of key parameters in each cluster of CA analysis over time and to find out the key parameters causing the differences between clusters [15,20]. Meanwhile, the principal component analysis method extracts important information about the criteria affecting the water quality and the potential sources of pollution leading to water quality fluctuations, from which it can be used to identify representative indicators to assess water quality [5,10,11,14,17,21].
The southern region of Vietnam is a dynamic economic zone with a high economic growth rate, with large industrial, business and service activities. It comprises of many industrial parks and handicraft production establishments widely dispersed in different sized localities types. The development of industrial zones and clusters is not synchronized with the technical infrastructure conditions on the environment; many industrial zones and clusters have not yet been invested in a centralized wastewater treatment system [4]. Along with the socio-economic development, the surface water quality is increasingly declining due to receiving domestic and industrial wastewater and agricultural runoff [4,17,22]. Domestic and industrial wastewater might contain hazardous substances from the production processes, and the residue of pesticides and fertilizers could exist in runoff and flow into surface water. Therefore, it is necessary to monitor water quality in areas affected by socio-economic development activities. This study applied multivariate statistical analysis to analyze water quality fluctuations using data from 58 monitoring locations in the southern region of Vietnam. The research results could provide useful information for the southern environmental management agency in reviewing and reevaluating the effective monitoring system of surface water quality.

Data analysis
Evaluation of water quality characteristics was performed based on the average value of each criterion of 8 sampling periods at 58 locations and presented in Boxplot form to describe the change over time of the water quality parameters. The Kolmogorov-Smirnov test was used to test the normal distribution at 5% significance level using SPSS version 20.0.0 (IBM, Armonk, NY, USA). The WQI index is calculated based on parameters including temperature, pH, DO, BOD, COD, NH 4 + , NO 2 − , NO 3 − , As, Cd, Pb according to the guidance of Decision 1460/QD-TCMT dated November 12, 2019 of the Vietnam Environment Administration on the issuance of a manual for calculating the WQI [8]. The WQI parameter has a value from 0 to 100 and is divided into 6 levels, as presented in Table 2. Geographic information system software QGIS version 3.16 was used to present the spatial distribution of the WQI. In this study, WQI was calculated using Microsoft Excel 2016.
CA was used to group water quality by sampling location and sampling frequency, using Euclidean distance [10]. CA results are presented as a tree structure. Sampling locations or times with similar water quality are grouped into the same group based on the linkage distance [11]. The linkage distance represents the quotient between the linkage distance (D link ) divided by the maximal distance (D max ). The link distance between clusters is considered to have clustering significance when D link/ D max × 100 = 60 [11]. The water quality characteristics of each cluster in cluster analysis according to the sampling frequency were weighted using the entropy information method to rank importance according to Li et al. [24]. The larger the information coefficient, the lower the entropy weight and the smaller the impact on water quality [24]. Therefore, entropy weight could measure the order degree of parameters in the cluster. Specifically, the higher the weight of the water quality parameters is the greater the influence. Conversely, the smaller the weighted value of the parameter is the less significant the influence [24].
PCA was used to identify potential sources of pollution and key indicators affecting water quality in the southern region of Vietnam. Potential sources affecting water quality are determined based on the Eigenvalue coefficient. If the Eigenvalues coefficient is greater than 1, the principal component (PC) is considered significant in causing the variability of the data (can be considered the main source); in stark contrast, when Eigenvalues coefficient of PC is lower than 1, it means that the PC contributed insignificantly to the variability of the data (can be called the secondary source) [21]. Meanwhile, the weighted correlation coefficient is used to determine the main indicators affecting water quality. The weighted correlation coefficient is divided into three levels of high, moderate and weak, with absolute values > 0.75, 0.75-0.50 and 0.50-0.30, respectively [10]. The higher the correlation coefficient, the major contributor to water quality variability and therefore needs to be monitored [10]. CA and PCA were performed using Statgraphics Centurion version XVI software (Statgraphics Technologies,Virginia, USA).

Evaluating surface water quality in southern Vietnam
The temperature between months of April to December ranged from 25.3 to 35.3°C, while it fluctuated between locations from 28.6-31.4°C, with an average of 58 sites for eight months 30.1°C. The pH ranged over time and space between 5.2-9.0 and 5.9-7.9, with a median of 58 sites for eight months of 7.1 (Fig. 2a). The pH at some locations was slightly alkaline and exceeded the allowable limit of 8.5 [7], but it is still within the allowable limit of 9. EC and TDS fluctuated greatly ( Fig. 2c and d). DO in the study area is lower than the permissible limit of ≥6 mg L − 1 [7]. DO has great volatility in April and July (Fig. 2e). Besides that, BOD at 58 survey sites varied from 1.6 to 48.8 mg L − 1 , while BOD between months of sampling ranged from 0 to 79 mg L − 1 . BOD in the rainy season months (May-September) fluctuated more than that in the dry season months (Fig. 2f). BOD at most locations exceeded the allowable limit of 4 mg L − 1 [7]. Similarly, COD between months and sampling sites ranged from 8 to 91 and 3 to 139 mg L − 1 , with the mean value at 17.5 mg L − 1 (Fig. 2g). COD has exceeded the allowable limit (10 mg L − 1 ) of QCVN 08-MT:2015/ BTNMT, column A1 [7]. TSS at sites and months fluctuated from 9 to 111 and 3-495 mg L − 1 , respectively, with an average of 36 mg L − 1 (Fig. 2b). TSS has seasonal variation in which the rainy season is usually higher than that in the dry season. TSS at most locations exceeded the allowable limit (20 mg L − 1 ).
The concentration of NH 4 + -N between the months of sampling fluctuated in the range of 1.0-24.9 mg L − 1 , while this concentration between the sampling sites was in the range of 0.1-15.0 mg L − 1 (Fig. 2h), reaching an  Nguyen and Huynh Sustainable Environment Research ( Fig. 2i). In locations with high concentration of NH 4 + and low DO, the nitrite-nitrogen concentration accumulated and exceeded the allowable limit of 0.05 mg L − 1 [7]. NO 3 − -N between sampling months and sampling sites ranged from 0 to 3.7 and 0.1-2.0 mg L − 1 , with the mean values at 0.8 mg L − 1 . NO 3 − concenrration was highest in locations such as Cat Lai ferry terminal (Dong Nai), Ho Chi Minh City area, Tan Thanh wharf (Long An) and Vedan Port (Tay Ninh). NO 3 − concentration in April was significantly higher than that in other months. The fluctuation of NO 3 − concentration depends on the concentration of NH 4 + and DO. The limit value of NO 3 − -N is 2 mg L − 1 according to QCVN 08-MT:2015/ BTNMT, column A1 [7], so only some positions at certain times exceed the allowed regulation.
The maximum Fe concentration at the locations was 4.4 mg L − 1 and at the time of sampling was 16.1 mg L − 1 , averaged at 1.8 mg L − 1 . All samples having Fe concentration exceeded the allowable limit of 0.5 mg L − 1 [7]. Fe in the rainy season tended to be higher than in the dry season ( Fig. 2o). High Cl − concentration is concentrated in the area near the sea of Ho Chi Minh City (HCM1, HCM10-12) with concentrations exceeding the allowable limit of 250 mg L − 1 [7]. The remaining positions all have Cl − concentrations within the allowable limits. Cl − has a marked seasonal variation in which the dry season is higher than that in the rainy season. The trend of Cl − fluctuations is similar to that of EC and TDS. High Cl − concentrations lead to high EC and TDS ( Fig. 2c and d). Pb in the study areas was within the allowable limit of 0.02 mg L − 1 [7], except for some locations such as Ong Buong bridge (HCM5 in May), lower Tri An dam (BD6 in June) and Be River estuary (BP4 in June) exceed the allowable limit (Fig. 2m). Cd and As at all locations through the sampling sessions were within the allowable limits of 0.005 and 0.01 mg L − 1 , respectively ( Fig. 2l and n) [7].

Evaluating the sampling sites and frequencies of the surface water quality monitoring
Water quality in the South of Vietnam varies greatly. Based on the Euclidean distance, the water quality at 58 sampling sites was classified into 15 clusters (Fig. 4).
Water quality similarity of sites in each cluster can be suggested to reduce the number of monitoring sites in the future. Specifically, the water quality characteristics in the identified clusters are presented in Table 3. Based on the Euclidean distance, water quality according to sampling periods was classified into 5 clusters (Fig. 5). Cluster I includes April (end of the dry season); cluster II, cluster III and cluster IV include May and June, July and August, September, respectively; which all months belong to the rainy season. While cluster V was formed by two months at the beginning of the dry season (November and December). The results show that the water quality fluctuates greatly over time in which the water quality in the dry season months tended to be separate from the rainy season months. Pollution characteristics according to identified clusters are presented in Table 4. Besides that, entropy weight of parameter represents the effect of that parameter on water quality of cluster. Cluster I was characterized by indicators in descending order NH

Identifying key water parameters influencing water quality
The PCA results show that water quality in the southern region of Vietnam is affected by many sources of pollution. In which, PC1-5 is considered the main source because it has an Eigenvalue > 1, explaining 87.3% of the variation in water quality. Meanwhile, the sources from PC6-13 as secondary explained only 12.7% of the variation (Table 5). In addition, Table 5 also showed that the loading coefficient of the main PCs had weak correlations with the parameters of temperature, pH, TDS, EC, DO, BOD, COD, NH 4 + , NO 2 − , Fe; had moderate correlations with pH, NH 4 + , NO 3 − , and a good correlation with Cl − . Secondary PCs had moderate to good correlations with all parameters except Cl − ( Table 5).
The water quality indicators in the southern region are affected by many pollution sources. For example, PC2, PC5 and PC6 indicated only weakly positive values for temperature and moderately by PC7; this suggested that there were about four potential sources of temperature variation. Similarly, pH was affected by 3 factors at weak level and 1 factor at medium level; EC and TDS were weakly affected by 2 factors and moderately affected by 1 factor. DO was weakly affected by PC1, 8, 9, and moderately by PC10. BOD  was affected weakly by PC1 and strongly by PC12 while COD was weakly affected by PC1, PC12 but moderately affected by PC11. TSS was weakly affected by 1 factor, moderately affected by 3 factors (Table  5). NH 4 + was moderately affected by PC11 but weakly affected by PC1, PC3, and PC12. NO 3 − is moderately affected by PC3 but weakly affected by PC7 and PC9. NO 2 − was weakly affected by 4 factors, moderately affected by 1 factor. Fe was weakly affected by PC4, PC9 and moderately affected by PC7. Cl − was strongly affected by PC7 and weakly affected by PC7.

Evaluating surface water quality in southern Vietnam
Temperature fluctuations are insignificant because water has the function of temperature regulation [1,25]. Temperature is in the range suitable for aquatic organisms [26]. The pH in the study areas is affected by wastewater and hydrological regime. pH in major rivers in the Mekong Delta is usually neutral, except for some coastal provinces where it is tested to be alkaline [1,19,25,27,28]. In addition, this has resulted in high EC and TDS in estuary areas where brackish and saline water intrusion is possible. In addition, TDS and EC are also affected by inorganic substances present in wastewater or effluents discharged into receiving waters [1,4,22].
At many locations DO is very low due to the impact of wastewater containing organic compounds and NH 4 + [22,29]. The fluctuations of DO are due to the presence of organic matters, the presence of algae, air diffusion and pH [4,26]. The results show that low DO and high BOD, COD revealed that surface water quality in the south of Vietnam has organic pollution, especially in Dong Nai area, Ho Chi Minh city and Long An where there are crowd industrial parks, and busy transport and fishing ports. Previous studies in the Mekong Delta also showed that the water was organically polluted and had seasonal fluctuations [1,19,27,28,30,31]. NH 4 + is present in wastewater, domestic waste, agriculture, industry, and landfills [4,17,22,32]. High NH 4 + is found in Dong Nai, Ho Chi Minh City and Long An areas where there are many industrial zones and fishing ports. The findings show that water bodies are contaminated with nutrients, potentially leading to eutrophication of  the water environment [33]. Nutrient pollution is a water quality issue of concern in the Mekong Delta waters, mainly caused by wastewater from treatment systems and agricultural production [4,30,31]. Iron in the aquatic environment in the study area was predicted to be from natural sources (acid sulfate soils) or industrial wastewater [4,9,17,22,34]. The presence of Fe in surface waters is a common problem in the Mekong Delta [1,4,31]. Other heavy metals are often derived from agricultural and industrial wastewater [17,22,35,36].
The results show that water quality varies greatly according to sampling location, the most polluted locations are in Dong Nai, Ho Chi Minh City and Long An where there are many navigation activities, industrial zones, and fishing ports. DO, TSS, BOD, COD, NH 4 + , Fe, EC, TDS, Cl − were seasonal fluctuations. The water quality in the southern part of Vietnam is contaminated with organic (low DO while TSS, BOD, and COD are high), nutrients (mainly NH 4 + ) and Fe. Pb in some locations exceeded the permissible limit. However, heavy metals such as Cd and As were within the allowable limits of 0.005 and 0.01 mg L − 1 , respectively [7]. This result is also consistent with previous studies on water quality in the Mekong Delta that have contaminated organic, Fe, and nutrient, even microorganisms [1,4,9,19,25,27,30,31,37].
On the other hand, the WQI results show that the water quality from Dong Nai, Ho Chi Minh and Long An areas is worse than that in other areas. The reason may be that this area receives wastes from socio-economic development activities with inadequate and ineffective wastewater treatment systems [4,17,22,35]. According to Nhien and Nguyen [32], these places with bad water quality are owing to the effects of agricultural production and landfills, and good water quality places had less impacts from these pollution sources. Water quality in large rivers is better than in tributaries [28,31,38,39].

Evaluating the sampling sites and frequencies of the surface water quality monitoring
The results of CA are consistent with the calculation results of the WQI. This has also been reported in previous studies [28,40]. Clusters 1 to 6 have very good water quality, while clusters 7 to 14 are characterized by poor to moderate overall water quality, cluster 15 is characterized by good overall water quality. The results show that water quality varies greatly by sampling sites, mainly due to industrial production, agriculture, transportation, and marine economic activities. Previous studies have shown that locations with the same water quality, in the same water body, can be considered for reduction to save on monitoring costs [5,10,11,21]. This study proposes that monitoring sites from clusters 1 to 6 can reduce sampling sites because of similar water quality. It accounts for one third of the total monitoring locations, from 33 locations to 22 locations, which can reduce 19% of monitoring costs. Otherwise, these 11 locations could be considered to switch to other potential polluted areas to promptly detect pollution problems.
For the cluster according to the sampling frequency, previous studies also found similar results that water quality was clustered by seasons [10,13,14,28,39]. The order of importance of the indicators changes with the seasons; that is, these indicators are affected by the observational periods. Clusters with similar water characteristics may consider reducing the sampling frequency. The results show that the sampling frequency could be reduced one for each Cluster II, Cluster III and Cluster V. Thus, this study recommends monitoring frequency from 8 to 5 times per year (April, June, August, September, November) depending on funding and human resources. This reduces monitoring costs by nearly 37.5%. Chounlamany et al. (2017) [10] also proposed to reduce the frequency of sampling for monitoring surface water quality based on the results of temporal CA.

Identifying key water parameters influencing water quality
In this study, changes in climate and weather are considered to be the main factors causing temperature fluctuations in surface water. Because the study area has a monsoon climate and two distinct seasons, this was also demonstrated in the CA analysis results. Furthermore, one of the reasons that can be mentioned in the study area (which has had rapid urbanization) is the impact of thermal runoff pollution. This means that precipitation has limited the natural infiltration into the soil in urbanized areas, resulting in the precipitation having to flow into surface areas of the heat-absorbing material; eventually flowing into the canals. Temperature fluctuations caused by the action of impervious surfaces have also been studied previously by Yang et al. (2021) [41]. In addition, the factor leading to the pH change may be due to the characteristics of wastewater and seawater [4,22]. EC, TDS, Cl − are mainly affected by saline intrusion and dissolved ions in wastewater [4,17,22]; TSS from riverbank erosion, plankton [1,4,25,42]. Organic substances represent the discharge processes [29]; while nutrients (NH 4 + , NO 2 − and NO 3 − ) represent the contributions of hydrological factors, wastewater from domestic, agriculture and industry. It is explained that agricultural cultivation (Southwest) and industrial production (Northeast) are two important activities of the region, which also lead to the attraction of migration from other provinces; therefore, these can be identified as two potential sources of pollution in the area. These factors have also been reported in several previous Nguyen  studies [17,22,27,35,43]. Fe was derived from natural and industrial wastewater [17,22,28,31] while other heavy metals (Pb, As, Cd) were derived from industrial waste [4,17]. The findings show that the parameters of temperature, pH, TDS, EC, DO, BOD, COD, NH 4 + , NO 2 − , Fe, Cl − , Pb need to be monitored while Hg, As, Cd may not need or can reduce the frequency of monitoring because their concentrations are always within the allowable level or below the detection limit. NO 3 − may also not need to be monitored because it can be commented from NH 4 + , NO 2 − and DO by in-stream nitrification. In addition, NO 3 − is less harmful to the environment than NH 4 + and NO 2 − in the study areas. EC and TDS have a close relationship, so only one of these two indicators is selected for monitoring. In this study, the important water quality indicators such as orthophosphate, total phosphate, total nitrogen, and coliforms have not been observed and need to be added to the future monitoring program for a more comprehensive assessment of surface water quality in the study area.

Conclusions
The results present that the water quality varies greatly according to the sampling locations, and the most polluted locations are in Dong Nai, Ho Chi Minh City and Long An where there are many navigation activities, industrial parks, and fishing ports. DO, TSS, BOD, COD, N-NH 4 + , Fe, EC, TDS, and Cl − have seasonal fluctuations. The water quality was contaminated with organic (low DO and high TSS, BOD, and COD), nutrients (mainly N-NH 4 + ) and Fe. Pb at some locations exceeded the allowable limit. Heavy metals such as Cd, Hg and As are within the allowable limits of QCVN 08-MT:2015/ BTNMT, column A1. WQI shows that water quality is classified from bad to very good (WQI = 42-100), in which good water quality is concentrated in Binh Phuoc, An Giang, Tien Giang, Ben Tre, Dong Thap provinces. Also, the average and bad water quality is concentrated in Dong Nai, Ho Chi Minh City, Long An. CA classifies water into 15 clusters in which clusters 1-6 have very good water quality, cluster 15 have good water quality, while clusters 7-14 have poor to moderate water quality. This study proposes to reduce the total number of monitoring points of clusters 1-6 by one third from 33 to 22 locations, reducing monitoring costs by about 19%. According to the sampling periods, water quality is classified into 5 clusters. It is recommended that the frequency of monitoring be reduced from 8 to 5 times per year, which helps to reduce monitoring costs by nearly 37.5%. The PCA results identified 5 major potential sources explaining 87.3% and 8 minor sources explaining only 12.7% of water quality variation. The parameters of temperature, pH, EC, DO, BOD, COD, N-NH 4 + , N-NO 2 − , Fe, Cl − , Pb need to be continuously monitored. Orthophosphate, total phosphate, total nitrogen, coliforms need to be added to the future monitoring program to assess surface water quality more comprehensively in the study area. The study provides important scientific information that can support future surface water quality monitoring.