Education quality management based on educational statistics with data mining methods

Table of contents: The Kazakh-American Free University Academic Journal №6 - 2014

Bogorodskaya Oxana, Kazakh American Free University, Kazakhstan
Smailova Saule, D. Serikbaev East Kazakhstan state technical university, Kazakhstan
Uvalieva Indira, D. Serikbaev East Kazakhstan state technical university, Kazakhstan


The analysis of up-to-date education has revealed the necessity to have reliable information about real education processes and education quality for the implementation of effective policy in the sphere of education, decision making, the implementation of education reforms. The unique base of education statistics on the ground of original data has been created for it. According to the Government Program of Education Development in the Republic of Kazakhstan for 2011 -2020 “The Base of Education Statistics Data” has been created on the Republic level. This unique base consists of 1659 indices and 1227 indicators for all levels of education [1].


Up-to-date approach

Today in the Republic of Kazakhstan the Rating of regions is conducted. It is necessary for the analysis and estimation of the education systems. It is conducted according to the indicators of education quality on the region level.

Each of these indices are determined by more significant indicators influencing the Rating of regions. The Rating scale is divided by the expert estimation method into 44 grades according to the weight coefficients of indices significance.

More than 100 representatives of the education system in different regions of the Republic of Kazakhstan took part in the poll concerning the determination of the weight coefficient and its significance. The weight coefficient of each index is determined by a two-grade scale. As a result the average weight coefficients of significance of all indices and indicators are determined. The Rating grade was calculated according to the following formula (1):


where is the Rating result grade of the region; is the Rating grade by education availability; is the quality of financial as well as material and technical resources; is the personnel quality; is the results of education institutions activities [2].

The proposed approach

The effective education process management sets the problems of development and usage of informational and analytical system. These problems are set before the education system of the Republic of Kazakhstan. They include the problems of modern technologies of various data gathering, data organization, data processing for analysis, modern analysis and modeling methods, forecasting methods, and decision-making methods. [3] Above-mentioned problems should be solved by Knowledge Discovery in Databases (KDD). KDD is a process of data transmission into knowledge. KDD includes data processing, Data Mining (DM) methods, the interpretations of the law by an expert.

The application of data mining methods for data processing gives the following opportunity:

- to determine objective laws and trends in the system of education data;

- to reflect such objective laws in diagrams and interactive visual means;

- to create reports for presentations and business-analytics;

- to analyze correlations and to form forecasting for the data of education statistics.

The realization of data mining of education statistics by the methods of the factor and cluster analyses

The scientists offer the following algorithm consisting of two steps. The first step is the step of generation of general variants. The second step is the step of their comparison and division into groups. Let’s examine the algorithm in details.

For these purposes the factor analysis method of main components is used. Let’s develop the model.


Here – observable variables (original features); – unobservable factors – factor loadings; – accidental error.

Here and uncorrelated, besides connected only with zero mean and variance

and - uncorrelated random variable with zero mean and uncorrelated random variable with zero mean and unit variance: (E – unit matrix).



Here i-entity, which is the part of variance, determined by factors; – the part of variance , determined by an error.

In matrix formulation the model is (4):


The factor equation must be done by method of the main components. The method of the main component analysis is used for reduced correlated matrix R+ with entities on the main diagonal. The coefficient of multiple correlations between adequate variable and a number of other variables is used to determine entities.

The factor analysis is conducted according to the following equation (5):


As a result the characteristic number and the characteristic matrix of vectors V must be obtained, and then matrix of factor pattern should be determined: .

For entities and factor loadings determination the empiric repeated algorithm is used. It is reduced to the real estimation of parameters. The algorithm is reduced to the following: the original estimation of the factor loadings is determined by the method of the main factors. On the base of the matrix R correlation the estimation of the main components and unique factors is conducted:

, (6)

Here – the main components (column-vectors); – coefficient of common factors; – basic data (column-vectors); – correspondent proper value of matrix R.

The estimations of entities are considered as (7).

. (7)

To determine the number of common factors the following criteria are used: the number of current factors can be estimated by content. As common factors m the number of proper values are considered. They are larger or equal to one. The number of factors is chosen. They determine the definite part of total variance or total capacity [4].

Education statistics data were processed and analyzed. They concerned general secondary education in East Kazakhstan (17 regions except Ust-Kamenogorsk and Semey) in 2012 [5]. These variables can be called original because they show the original data. Generally there are 22 variables. Then it is necessary to introduce standards of obtained data which don’t depend on quantitative indices. They are called the secondary data. So 17 variables were obtained. In the result of factorization (data reduction) 6 main factors were determined. They explain 80,5% of total variance as in Table 1.

Let’s explain the data using displaced component matrix (table 2).

Factor 1 is connected with such indices as correlation between students and teachers; the average size of school; the number of students in each class. It can be characterized as a network factor. Factor 2 is connected with such indices as the share of teachers having higher education; the share of teachers of the 1st, 2d and higher categories; the share of teachers over 60. It is the factor of resources. Factor 3 includes the following indices: the share of local budgets expenditures per the wage of educator; the share of expenditures for purchasing and delivery of books; the share of administrative managers of general secondary education who have finished refresher courses. It can be called as the factor of management. Factor 4 includes the following indices: the share of schools in accident conditions; the share of schools in typical buildings; the share of schools conducting classes afternoon. It is the factor of conditions. Factor 5 includes the following indices: the share of schools using Internet traffic totally; the share of schools with language and multimedia laboratories; the number of students per each computer. It is the factor of informatization. Factor 6 is connected with such indices as the number of students getting additional education; the number of students performing after classes work. It can be called the factor of conditions for students’ individual requirements (The factor of additional education).

The important characteristic feature of factor analysis is the fact, that the result factors in contrast to base values are independent. The regression equation in the condition of multicollinearity is more rational with the main components. The components are linear functions of all base values. They are not correlated.

The second step is classification by some general indices (the main components) obtained with the method of factor analysis.

The methods of cluster analysis were used for classification as well. The advantage of cluster analysis is that classification can be done not by one variable but by a number of variables. It gives the opportunity not only to determine the groups of similar entities. It gives the opportunity to divide into clusters and it gives the opportunity to explain it.

As the result the regions were divided into 4 clusters which you can find in table 3. You can find the average value by the cluster of appropriate factors.

The average indices for 4 clusters are presented in picture 1

Figure 1. The average indices or each cluster (figures of graphs are the numbers of clusters)

You should pay attention to the distance between centers of four clusters (Table 4).

As you see clusters 2, 3 and 4 are approximately remote from each other. Euclidean distance between them is equal to 3,045; 3,195 and 3,373. Cluster S1 is the more significant for living standards. The distance between S1 and S2, S3, and S4 is equal to 2,98; 2,207; 2,676.

The first cluster is characterized by the high level “The management factor”. “The conditions factor” and “The network factor” are negative. The second cluster is characterized by the highest levels of all indices besides “The informatization factor”. It consists of two industrially developed regions (Ziryanovsk and Kurchatov). The third cluster consists of four regions. It has high levels of “The network factor”, “The additional education factor”. The rest two regions of the fourth cluster have high levels of “The resources factors”, “The conditions factor”, “The management factor”. The rest factors are negative.


Finally, the up-to-date approach in the education quality management of the Republic of Kazakhstan has a number of disadvantages such as engaging of more than 100 educators from different regions of the Republic. The rank approach is the regulation of a number of entities. It doesn’t determine the degree of disbalance between entities. So the new approach of the education quality management based on education statistics with data mining methods was developed. The developed information and analytic system of the education quality management includes information and reference system, monitoring facilities of education processes, analysis and estimation instruments of education quality, decision-making means for improvement activities in the sphere of education and planning of caution and correct activities.


1. «Ob utverzhdenii Gosudarstvennoy programmy razvitiya obrazovaniya Respubliki Kazakhstan na 2011 - 2020 gody. Ukaz Prezidenta Respubliki Kazakhstan ot 7 dekabrya 2010 goda № 1118» [Decree of the president of the Republic of Kazakhstan # 1118 as of December 7, 2010 on Approval of the National Program of Education Development for 2011-2010], Kazakhstanskaya pravda 338 (26399), 2-4 (2010)

2. Kultumanova, A., Nogaybaeva, G., Kussidenova, G., Yessinbayeva, Zh., Sadykova, Zh., Natsionalny doklad o sostoyanii i razvitii sistemy obrazovaniya Respubliki Kazakhstan, 2012 god [National Report on current condition of development of the system of education of the Republic of Kazakhstan], NCOSO, Astana, 35-93 (2013).

3. Konstantinovsky, D.L., Ot sbora statisticheskikh dannykh – k informatsionnomu obespecheniyu prinyatiya reshe-niy [From gathering statistical data to decision making informational support], Logos, Moskva, 119-131 (2006).

4. Zhukovskaya, V.M., Muchnik I.B., Faktorny analiz v sotsialno – ekonomi-cheskikh issledovaniyakh [Factor based analysis in social and economic research], Statistika, Moskva, 82-91 (1976).

5. Argyngazin, D.S., Matkarimova, G.A., Tokarev, N.Yu., Semenova, L. V., Assanova, M.I., Baykhonova, S.Z., Rovnyakova, I.V. [Doklad o sostoyanii i razvitii obrazovaniya v VKO [Report on current state and development of education in east Kazakhstan], Ustь-Kamenogorsk, 74-89 (2012)

Table of contents: The Kazakh-American Free University Academic Journal №6 - 2014

About journal
About KAFU

   © 2017 - KAFU Academic Journal