Education quality management based on educational statistics with data mining methods
Table of contents: The Kazakh-American Free University Academic Journal №6 - 2014
Bogorodskaya Oxana, Kazakh American Free University, Kazakhstan
Smailova Saule, D. Serikbaev East Kazakhstan state technical university, Kazakhstan
Uvalieva Indira, D. Serikbaev East Kazakhstan state technical university, Kazakhstan
The analysis of
up-to-date education has revealed the necessity to have reliable information
about real education processes and education quality for the implementation of
effective policy in the sphere of education, decision making, the implementation
of education reforms. The unique base of education statistics on the ground of original
data has been created for it. According to the Government Program of Education
Development in the Republic of Kazakhstan for 2011 -2020 “The Base of Education
Statistics Data” has been created on the Republic level. This unique base
consists of 1659 indices and 1227 indicators for all levels of education .
EDUCATION QUALITY MANAGEMENT ON THE BASE OF EDUCATION STATISTICS
Today in the Republic of Kazakhstan the Rating of regions is conducted. It is necessary for the
analysis and estimation of the education systems. It is conducted according to
the indicators of education quality on the region level.
Each of these
indices are determined by more significant indicators influencing the Rating of
regions. The Rating scale is divided by the expert estimation method into 44 grades
according to the weight coefficients of indices significance.
More than 100
representatives of the education system in different regions of the Republic of Kazakhstan took part in the poll concerning the determination of the weight
coefficient and its significance. The weight coefficient of each index is
determined by a two-grade scale. As a result the average weight coefficients of
significance of all indices and indicators are determined. The Rating grade was
calculated according to the following formula (1):
where is the Rating result grade
of the region; is
the Rating grade by education availability; is the quality of financial as well as
material and technical resources; is the personnel quality; is the results of education institutions
The proposed approach
education process management sets the problems of development and usage of
informational and analytical system. These problems are set before the education
system of the Republic of Kazakhstan. They include the problems of modern
technologies of various data gathering, data organization, data processing for
analysis, modern analysis and modeling methods, forecasting methods, and
decision-making methods.  Above-mentioned problems should be solved by Knowledge
Discovery in Databases (KDD). KDD is a process of data transmission into
knowledge. KDD includes data processing, Data Mining (DM) methods, the interpretations
of the law by an expert.
The application of
data mining methods for data processing gives the following opportunity:
- to determine
objective laws and trends in the system of education data;
- to reflect such
objective laws in diagrams and interactive visual means;
- to create
reports for presentations and business-analytics;
- to analyze
correlations and to form forecasting for the data of education statistics.
The realization of data mining of education statistics by the
methods of the factor and cluster analyses
offer the following algorithm consisting of two steps. The first step is the
step of generation of general variants. The second step is the step of their
comparison and division into groups. Let’s examine the algorithm in details.
For these purposes
the factor analysis method of main components is used. Let’s develop the model.
observable variables (original features); –
unobservable factors –
factor loadings; –
Here and uncorrelated, besides connected only with zero mean and variance
uncorrelated random variable with zero mean and uncorrelated random variable
with zero mean and unit variance: (E
– unit matrix).
Here i-entity, which is the part of
determined by factors; –
the part of variance ,
determined by an error.
formulation the model is (4):
equation must be done by method of the main components. The method of the main
component analysis is used for reduced correlated matrix R+ with entities on the main diagonal. The coefficient of multiple correlations
between adequate variable and a number of other variables is used to determine
analysis is conducted according to the following equation (5):
As a result the
characteristic number and
the characteristic matrix of vectors V must be obtained, and then matrix of
factor pattern should be determined: .
For entities and
factor loadings determination the empiric repeated algorithm is used. It is
reduced to the real estimation of parameters. The algorithm is reduced to the following:
the original estimation of the factor loadings is determined by the method of
the main factors. On the base of the matrix R correlation the estimation
of the main components and unique factors is conducted:
the main components
coefficient of common factors; –
basic data (column-vectors); –
correspondent proper value of matrix R.
The estimations of
entities are considered as (7).
To determine the
number of common factors the following criteria are used: the number of current
factors can be estimated by content. As common factors m the number of
proper values are considered. They are larger or equal to one. The number of
factors is chosen. They determine the definite part of total variance or total
statistics data were processed and analyzed. They concerned general secondary
education in East Kazakhstan (17 regions except Ust-Kamenogorsk and Semey) in
2012 . These variables can be called original because they show the original
data. Generally there are 22 variables. Then it is necessary to introduce
standards of obtained data which don’t depend on quantitative indices. They are
called the secondary data. So 17 variables were obtained. In the result of
factorization (data reduction) 6 main factors were determined. They explain
80,5% of total variance as in Table 1.
Let’s explain the
data using displaced component matrix (table 2).
Factor 1 is connected with such indices as
correlation between students and teachers; the average size of school; the
number of students in each class. It can be characterized as a network factor.
Factor 2 is connected with such indices as the share of teachers having higher education; the
share of teachers of the 1st, 2d and higher categories; the share of teachers over 60. It is the factor of
resources. Factor 3 includes the following indices: the share of local budgets expenditures per
the wage of educator; the share of expenditures for purchasing and delivery of books;
the share of administrative managers of general secondary education who have finished
refresher courses. It can be called as the factor of management. Factor 4 includes
the following indices: the
share of schools in accident conditions; the share of schools in typical
buildings; the share of schools conducting classes afternoon. It is the factor
of conditions. Factor 5 includes the following indices: the share of schools using Internet traffic
totally; the share of schools with language and multimedia laboratories; the number
of students per each computer. It is the factor of informatization. Factor 6 is connected with such indices as the number of students getting additional
education; the number of students performing after classes work. It can be called
the factor of conditions for students’ individual requirements (The factor of
characteristic feature of factor analysis is the fact, that the result factors
in contrast to base values are independent. The regression equation in the condition
of multicollinearity is more rational with the main components. The components
are linear functions of all base values. They are not correlated.
The second step is
classification by some general indices (the main components) obtained with the
method of factor analysis.
The methods of
cluster analysis were used for classification as well. The advantage of cluster
analysis is that classification can be done not by one variable but by a number
of variables. It gives the opportunity not only to determine the groups of
similar entities. It gives the opportunity to divide into clusters and it gives
the opportunity to explain it.
As the result the
regions were divided into 4 clusters which you can find in table 3. You can
find the average value by the cluster of appropriate factors.
The average indices for 4 clusters are
presented in picture 1
1. The average indices or each cluster (figures of graphs are the numbers of
You should pay attention to the distance
between centers of four clusters (Table 4).
As you see
clusters 2, 3 and 4 are approximately remote from each other. Euclidean
distance between them is equal to 3,045; 3,195 and 3,373. Cluster S1 is the more
significant for living standards. The distance between S1 and S2, S3, and S4 is
equal to 2,98; 2,207; 2,676.
The first cluster
is characterized by the high level “The management factor”. “The conditions
factor” and “The network factor” are negative. The second cluster is
characterized by the highest levels of all indices besides “The informatization
factor”. It consists of two industrially developed regions (Ziryanovsk and
Kurchatov). The third cluster consists of four regions. It has high levels of
“The network factor”, “The additional education factor”. The rest two regions
of the fourth cluster have high levels of “The resources factors”, “The
conditions factor”, “The management factor”. The rest factors are negative.
up-to-date approach in the education
quality management of the Republic of Kazakhstan has a number of
disadvantages such as engaging of more than 100 educators from different
regions of the Republic. The rank approach is the regulation of a number of entities. It doesn’t determine
the degree of disbalance between entities. So the new approach of the education quality management based
on education statistics with data mining methods was developed. The developed information
and analytic system of the education
quality management includes information and reference system, monitoring
facilities of education processes, analysis and estimation instruments of
education quality, decision-making means for improvement activities in the
sphere of education and planning of caution and correct activities.
1. «Ob utverzhdenii Gosudarstvennoy programmy
razvitiya obrazovaniya Respubliki Kazakhstan na 2011 - 2020 gody. Ukaz
Prezidenta Respubliki Kazakhstan ot 7 dekabrya 2010 goda № 1118» [Decree of the
president of the Republic of Kazakhstan # 1118 as of December 7, 2010 on
Approval of the National Program of Education Development for 2011-2010],
Kazakhstanskaya pravda 338 (26399), 2-4 (2010)
2. Kultumanova, A., Nogaybaeva, G.,
Kussidenova, G., Yessinbayeva, Zh., Sadykova, Zh., Natsionalny doklad o
sostoyanii i razvitii sistemy obrazovaniya Respubliki Kazakhstan, 2012 god
[National Report on current condition of development of the system of education
of the Republic of Kazakhstan], NCOSO, Astana, 35-93 (2013).
3. Konstantinovsky, D.L., Ot sbora statisticheskikh
dannykh – k informatsionnomu obespecheniyu prinyatiya reshe-niy [From gathering
statistical data to decision making informational support], Logos, Moskva,
4. Zhukovskaya, V.M., Muchnik I.B., Faktorny
analiz v sotsialno – ekonomi-cheskikh issledovaniyakh [Factor based analysis in
social and economic research], Statistika, Moskva, 82-91 (1976).
5. Argyngazin, D.S., Matkarimova, G.A.,
Tokarev, N.Yu., Semenova, L. V., Assanova, M.I., Baykhonova, S.Z., Rovnyakova,
I.V. [Doklad o sostoyanii i razvitii obrazovaniya v VKO [Report on current
state and development of education in east Kazakhstan], Ustь-Kamenogorsk,
Table of contents: The Kazakh-American Free University Academic Journal №6 - 2014