Market segmentation methods. Market segmentation methods Cluster segmentation method

I work in the email marketing industry for a site called MailChimp.com. We help clients create newsletters for their advertising audience. Every time someone calls our work “mail stuffing,” I feel an unpleasant cold in my heart.

Why? Yes, because email addresses are no longer black boxes that you bombard with messages like grenades. No, in email marketing (as in other forms of online contact, including tweets, Facebook posts, and Pinterest campaigns), businesses gain insight into how audiences engage on an individual level through tracking clicks, online orders, distribution of statuses on social networks, etc. This data is not just interference. They characterize your audience. But for the uninitiated, these operations are akin to the wisdom of the Greek language. Or Esperanto.

How do you collect transactional data from your customers (users, subscribers, etc.) and use their data to better understand your audience? When you deal with many people, it is difficult to study each client individually, especially if they all contact you differently. Even if in theory you could reach everyone personally, in practice this is unlikely to be feasible.

You need to take your customer base and find a middle ground between random bombardment and personalized marketing for each individual customer. One way to achieve this balance is by using clustering to segment your customer market so that you can appeal to different segments of your customer base with different targeted content, offers, etc.

Cluster analysis is the collection of various objects and dividing them into groups of similar ones. By working with these groups - identifying what their members have in common and what sets them apart - you can learn a lot about the jumble of data you have. This knowledge will help you make better decisions, and at a more detailed level than before.

In this context, clustering is called exploratory data mining because these techniques help to “pull out” information about relationships in huge data sets that cannot be captured visually. And discovering connections in social groups is useful in any industry - for recommending films based on the habits of the target audience, for identifying crime centers in a city, or justifying financial investments.

One of my favorite uses of clustering is image clustering: lumping together image files that "look the same" to the computer. For example, in image hosting services like Flickr, users produce a ton of content and simple navigation becomes impossible due to the large number of photos. But using clustering techniques, you can group similar images together, allowing the user to navigate between these groups before detailed sorting.

Supervised or unsupervised machine learning?

In data mining, by definition, you don't know ahead of time what kind of data you're looking for. You are a researcher. You can clearly explain when two customers look similar and when they look different, but you don't know the best way to segment your customer base. That's why "asking" a computer to segment your customer base for you is called unsupervised machine learning, because you're not in control - you're not telling the computer how to do its job.

In contrast to this process, there is supervised machine learning, which tends to emerge when artificial intelligence hits the front page. If I know that I want to divide customers into two groups - say, "likely to buy" and "unlikely to buy" - and feed the computer with historical examples of such customers, applying all the innovations to one of these groups, then this is control.

If instead I said, “Here's what I know about my clients and here's how to tell if they're different or the same. Tell me something interesting,” this is a lack of control.

This chapter examines the simplest clustering method called k-means, which dates back to the 50s and has since become a staple in database knowledge discovery (DKD) across all industries and government agencies.

The k-means method is not the most mathematically accurate of all methods. It was created primarily for reasons of practicality and common sense - like an African-American kitchen. It does not have such a chic pedigree as the French one, but it often caters to our gastronomic whims. Cluster analysis with k-means, as you'll soon see, is part math and part history (about a company's past events, if that comparison applies to management education methods). Its undoubted advantage is its intuitive simplicity.

Let's see how this method works using a simple example.

Girls dance with girls, boys scratch their heads

The goal of k-means clustering is to select several points in space and turn them into k groups (where k is any number you choose). Each group is defined by a point in the center, like a flag stuck in the moon and signaling: “Hey, here's the center of my group! Join if you are closer to this flag than the others!” This group center (officially called cluster centroid) is the very average of the name of the k-means method.

Let's take school dances as an example. If you have managed to erase the horror of this “entertainment” from your memory, I am very sorry for bringing back such painful memories.

The heroes of our example - students from Makakne High School who came to a dance evening under the romantic name "Ball at the Bottom of the Sea" - are scattered around the assembly hall, as shown in Fig. 1. I even painted the parquet floor in Photoshop to make it easier to imagine the situation.

Rice. 1. Makakne High School students sit in the assembly hall

Here are examples of songs that these young leaders of the free world will clumsily dance to (if you suddenly want music accompaniment, for example, on Spotify):

Styx: Come Sail Away
Everything But the Girl: Missing
Ace of Base: All that She Wants
Soft Cell: Tainted Love
Montell Jordan: This is How We Do It
Eiffel 65: Blue

Now k-means clustering depends on the number of clusters into which you want to divide those present. Let's start with three clusters (we'll look at choosing k later in this chapter). The algorithm places three flags on the floor of the assembly hall in some acceptable way, as shown in Fig. 2, where you see 3 initial flags distributed by gender and marked with black circles.

Rice. 2. Placement of initial cluster centers

In k-means clustering, dancers are assigned to their closest cluster center, so that a line of demarcation can be drawn between any two centers on the floor. Thus, if the dancer is on one side of the line, he belongs to one group, if on the other side, then to another (as in Fig. 3).

Rice. 3. Lines mark cluster boundaries

Using these demarcation lines, divide the dancers into groups and color them accordingly, as in Fig. 4. This diagram, which divides space into polygons defined by proximity to a particular cluster center, is called a Voronoi diagram.

Rice. 4. Grouping into clusters marked by different background patterns in a Voronoi diagram

Let's look at our initial division. Something's wrong, isn't it? The space is divided in a rather strange way: the lower left group remains empty, and on the border of the upper right group, on the contrary, there are many people.

The k-means clustering algorithm moves cluster centers across genders until it reaches the best result.

How to determine the “best result”? Each person present is some distance from their cluster center. The smaller the average distance from the participants to the center of their group, the better the result.

Now we introduce the word “minimization” - it will be very useful to you in optimizing the model for the best location of cluster centers. In this chapter, you will make Find a Solution move cluster centers countless times. The way Solution Finder uses to find the best location for cluster centers is to slowly iteratively move them around the surface, taking the best results found and combining them (literally mating them like racehorses) to find the best location.

So if the diagram in Fig. 4 looks rather pale, “Search for a solution” can suddenly arrange the centers as in Fig. 5. This will reduce the average distance between each dancer and his center slightly.

Rice. 5. Slightly shift the centers

Obviously, sooner or later Solution Finder will realize that the centers must be placed in the middle of each group of dancers, as shown in Fig. 6.

Rice. 6. Optimal clustering at school dances

Great! This is what ideal clustering looks like. Cluster centers are located at the center of each group of dancers, minimizing the average distance between a dancer and the nearest center. Now that the clustering is complete, it's time to move on to the fun part, which is trying to understand what these clusters mean.

If you know the dancers' hair color, their political preferences, or their time in the 100-meter dash, then clustering doesn't make much sense.

But once you decide to determine the age and gender of those present, you will begin to see some general trends. The small group below are older people, most likely accompanying people. The group on the left is all boys, and the group on the right is all girls. And everyone is very afraid to dance with each other.

Thus, k-means allowed you to divide many dancegoers into groups and correlate the characteristics of each attendee with membership in a particular cluster to understand the reason for the division.

Now you are probably saying to yourself: “Come on, what nonsense. I already knew the answer before starting.” You're right. In this example - yes. I deliberately gave such a “toy” example, being sure that you can solve it just by looking at the dots. The action takes place in a two-dimensional space, in which clustering is done simply with the help of the eyes.

But what if you run a store that sells thousands of products? Some buyers have made one or two purchases in the last two years. Others - dozens. And everyone bought something of their own.

How do you cluster them on such a “dance floor”? Let's start with the fact that this dance floor is not two-dimensional, or even three-dimensional. This is a thousand-dimensional space for the sale of goods in which the buyer purchased or did not purchase the goods in each dimension. You can see how quickly the clustering problem begins to go beyond the capabilities of a “first-rate eyeball,” as my military friends like to say.

Real Life: K-Means Clustering in Email Marketing

Let's move on to a more specific case. I'm an email marketer, so I'll give you an example from Mailchimp.com, where I work. This same example will work with data from retail, ad traffic conversion, social media, etc. It interacts with almost any type of data related to reaching customers with advertising material, after which they choose you unconditionally.

Wholesale Wine Empire Joey Bag O'Donuts

Imagine for a moment that you live in New Jersey, where you run Joey Bag O'Donuts Wholesale Wine Empire. It is an import-export business whose purpose is to ship large quantities of wine from overseas and sell it to certain liquor stores throughout the country. The way this business works is that Joey travels all over the world looking for incredible deals on lots of wine. He sends it to his home in Jersey, and it's up to you to put it into stores and make a profit.

You find customers in many ways: a Facebook page, a Twitter account, sometimes even direct mail - after all, emails “promote” most types of business. Last year you sent one email per month. Usually each letter describes two or three transactions, say one for champagne and another for malbec. Some deals are amazing - 80% off or more. As a result, you concluded about 32 transactions in a year and all of them went more or less smoothly.

But just because things are going well doesn't mean they can't get better. It would be useful to understand the motives of your customers a little deeper. Of course, looking at a specific order, you see that a certain Adams bought some sparkling wine in July with a 50% discount, but you cannot determine what prompted him to buy. Did he like the minimum order quantity of one box of six bottles or the price that had not yet risen to its maximum?

It would be nice to be able to divide your client list into interest groups. Then you could edit letters to each group separately and, perhaps, promote your business even more. Any deal suitable for this group could become the subject of the letter and appear in the first paragraph of the text. This type of targeted mailing can cause a real explosion in sales!

There is an option to let the computer do the work for you. Using k-means clustering, you can find the best grouping and then try to understand why it is the best.

Original Dataset

The Excel document that we will analyze in this chapter is located on the book's website. It contains all the source data in case you want to work with it. Or you can simply follow the text by looking at the remaining sheets of the document.

To start, you have two interesting data sources:

metadata for each order is stored in a spreadsheet, including varietal, minimum quantity of wine per order, retail discount, whether the price cap has been passed, and country of origin. This data is located in a tab called OfferInformation, as shown in Fig. 7;
Knowing which customers are ordering what, you can rip that information out of MailChimp and feed it into a spreadsheet with offer metadata in the Transactions tab. This is variable data represented as shown in Fig. 8, very simple: the buyer and his order.

Rice. 7. Details of the last 32 orders

Rice. 8. List of orders by customer

Determining the subject of measurement

And here's the challenge. In the school dance problem, measuring the distance between those present and identifying cluster centers was easy, right? You just need to find the right tape measure! But what to do now?

You know that last year there were 32 deal offers and you have a list of 324 orders in a separate tab, broken down by buyer. But to measure the distance from each buyer to the cluster center, you must place them in this 32-deal space. In other words, you need to figure out what deals they didn't complete and create a deal-by-customer matrix in which each customer gets their own column with 32 deal cells filled with ones if the deals were completed and zeros if they weren't.

In other words, you need to take this row-oriented table of deals and turn it into a matrix, with customers arranged vertically and offers horizontally. The best way to create it is with pivot tables.

Action algorithm: on the sheet with variable data, select columns A and B, and then insert a pivot table. Using the PivotTable Wizard, simply select Deals as the row header and Customers as the column header and fill out the table. The cell will be 1 if the customer-deal pair exists, and 0 if it is not (in this case, 0 is shown as an empty cell). The result is the table shown in Fig. 9.

Rice. 9. Customer-deal summary table

Now that you have your order information in a matrix format, copy the OfferInformation sheet and name it Matrix. In this new worksheet, paste the values from the pivot table (no need to copy and paste the deal number because it's already in the order information), starting with column H. You should end up with an expanded version of the matrix, complete with order information like in Fig. 10.

Rice. 10. Descriptions of transactions and order data merged into a single matrix

Data Standardization

This chapter presents each dimension of your data in the same way, as binary order information. But in many situations involving clustering, we cannot do this. Imagine a scenario in which people are clustered by height, weight, and salary. All these three types of data have different dimensions. Height can vary from 1.5 to 2 meters, while weight can range from 50 to 150 kg.

In this context, measuring the distance between customers (like between dancers in an assembly hall) becomes a confusing matter. Therefore, it is common to standardize each column of data by subtracting the mean and then dividing in turn by a measure of dispersion called standard deviation. Thus, all columns are reduced to a single value, varying quantitatively around 0.

Let's start with four clusters

Well, now all your data is reduced to a single convenient format. To start clustering, you need to select k - the number of clusters in the k-means algorithm. A common way to use k-means is to take a set of different k's and test them one at a time (I'll explain how to choose them later), but we're just getting started - so we'll just pick one.

You will need a number of clusters that is roughly appropriate for what you want to do. You obviously don't intend to create 50 clusters and send 50 targeted promotional emails to a couple of guys from each group. This immediately defeats the purpose of our exercise. In our case, we need something small. Start this example with 4 - in an ideal world, you would probably divide your client list into 4 clear groups of 25 people each (which is unlikely in reality).

So, if you have to divide buyers into 4 groups, what is the best way to select them?

Instead of ruining the nice Matrix sheet, copy the data into a new sheet and call it 4MC. Now you can insert 4 columns after the price high in columns H to K, which will be the cluster centers. (To insert a column, right-click on column H and select Insert. The column will appear on the left.) Name these clusters Cluster 1 through Cluster 4. You can also apply conditional formatting on them, and whenever you install them, you can see how different they are.

The 4MC sheet will appear as shown in Fig. eleven.

Rice. eleven. Empty cluster centers placed on a 4MC sheet

In this case, all cluster centers are zeros. But technically they can be anything and, what you will especially like - like at a school dance, they are distributed in such a way that they minimize the distance between each buyer and his cluster center.

Obviously, then these centers will have values from 0 to 1 for each transaction, since all client vectors are binary.

But what does it mean to “measure the distance between the cluster center and the customer”?

Euclidean distance: measuring distances directly

You have a separate column for each client. How to measure the distance between them? In geometry this is called the "shortest path" and the resulting distance is called the Euclidean distance.

Let's return to the assembly hall for a moment and try to understand how to solve our problem there.

Let's place the coordinate axes on the floor and in Fig. 12 we will see that at point (8,2) we have a dancer, and at (4,4) we have a cluster center. To calculate the Euclidean distance between them, you will have to remember the Pythagorean theorem, which you have been familiar with since school.

Rice. 12. Dancer at (8,2) and cluster center at (4,4)

These two points are 8 - 4 = 4 meters apart vertically and 4 - 2 = 2 meters horizontally. According to the Pythagorean theorem, the square of the distance between two points is 4A2+2A2 = 20 meters. From here we calculate the distance itself, which will be equal to the square root of 20, which is approximately 4.47 m (as in Fig. 13).

Rice. 13. The Euclidean distance is equal to the square root of the sum of the distances in each direction

In the context of newsletter subscribers, you have more than two dimensions, but the same concept applies. The distance between the buyer and the cluster center is calculated by taking the differences between the two points for each trade, squaring them, adding them, and taking the square root. For example, on worksheet 4MS, you want to know the Euclidean distance between the center of cluster 1 in column H and customer Adams' orders in column L.

In cell L34, under the Adams orders, you can calculate the difference between the Adams vector and the cluster center, square it, add it, and then root it using the following formula for arrays (note the absolute links, allowing you to drag this formula to the right or down without changing the link to the cluster center):

(=ROOT(SUM(L$2:L$33-$H$2:$H$33)A2)))

The array formula (type the formula and press Ctrl+Shift+Enter or Cmd+Return on MacOS, as stated in Chapter 1) needs to be used because the (L2:L33-H2:H33)^2 part of it needs to "know" where contact to calculate the differences and square them, step by step. However, the result in the end is a single number, in our case 1.732 (as in Fig. 14). It has the following meaning: Adams made three trades, but since the initial cluster centers are zero, the answer will be equal to the square root of 3, namely 1.732.

Rice. 14. Distance between cluster center 1 and Adams

In the spreadsheet in Fig. 2-14, I anchored the top row (see Chapter 1) between columns G and H and named row 34 in cell G34 “Distance to Cluster 1,” just so I could see what was where as I scrolled down the page.

Distances and cluster membership for everyone!

Now you know how to calculate the distance between the order vector and the cluster center.

Now it's time to add Adams calculation of distances to the remaining cluster centers by dragging cell L34 down to L37 and then manually changing the cluster center reference from column H to column I, J, and K in the cells below. The result should be the following 4 formulas in L34:L37:

(=SQRT(SUM((L$2:L$33-$H$2:$H$33)A2)))
(=SQRT(SUM((L$2:L$33-$I$2:$I$33)A2)))
(=SQRT(SUM((L$2:L$33-$J$2:$J$33)A2)))
(=SQRT(SUM((L$2:L$33-$K$2:$K$33)A2)))
(=ROOT(SUM((L$2:L$33-$H$2:$H$33)A2)))
(=ROOT(SUM((L$2:L$33-$I$2:$I$33)A2)))
(=ROOT(SUM((L$2:L$33-$J$2:$J$33)A2)))
(=ROOT(SUM((L$2:L$33-$K$2:$K$33)A2)))

Since you used absolute links for the cluster centers (that's what the $ sign in the formulas means, as explained in Chapter 1), you can drag L34:L37 into DG34:DG37 to calculate the distance from each customer to all four cluster centers. Title the rows in column G in cells 35 to 37 “Distance to Cluster 2,” etc. The newly calculated distances are shown in Fig. 15.

Rice. 15. Calculation of distances from each buyer to all cluster centers

Now you know the distance of each client to all four cluster centers. Their distribution into clusters was carried out according to the shortest distance in two steps as follows.

First, let's go back to Adams in column L and calculate the minimum distance to the cluster center in cell L38. It's simple:

Min(L34:L37)
=min(L34:L37)

To calculate, we use the match/searchpose formula (more details in Chapter 1). By placing it in L39, you can see the cell number from the interval L34:L37 (I count each in order from 1), which is at the minimum distance:

Match(L38,L34:L37,0) =searchpose(L38,L34:L37,0)

In this case, the distance is the same for all four clusters, so the formula selects the first one (L34) and returns 1 (Figure 16).

Rice. 16. Adding cluster bindings to the sheet

You can also drag and drop these two formulas onto DG38: DG39. To be even more organized, add the titles of rows 38 and 39 to cells 38 and 39 of column G, “Minimum Cluster Distance” and “Assigned Cluster.”

Finding solutions for cluster centers

Your spreadsheet has been updated with distance calculations and linking to clusters. Now, to determine the best location of the cluster centers, we need to find those values in columns H to K that minimize the total distance between the buyers and the cluster centers to which they are attached, indicated in line 39 for each buyer.

When you hear the word “minimize”: the optimization stage begins, and optimization is done using “Solution Search”.

To use Find a Solution, you'll need a results cell, so in A36 we'll sum up all the distances between customers and their cluster centers:

SUM(L38:DG38)
=CUMMA(L3 8:DG3 8)

This sum of the distances from clients to their nearest cluster centers is exactly the objective function we encountered earlier during the clustering of the Macakne High School auditorium. But Euclidean distance, with its powers and square roots, is a monstrously nonlinear function, so you'll have to use an evolutionary solution algorithm instead of the simplex method.

You already used this method in Chapter 1. The simplex algorithm, if it is possible to use it, works faster than others, but it cannot be used to calculate roots, squares and other nonlinear functions. OpenSolver, which uses a simplex algorithm, even if it looks like it took steroids, is just as useless.

In our case, the evolutionary algorithm built into Solution Finder uses a combination of random search and an excellent crossbreeding solution to, like evolution in a biological context, find efficient solutions.

You have everything you need to set the problem before “Searching for a solution”:

goal: to minimize the total distances from customers to their cluster centers (A36);
variables: vector of each transaction relative to the cluster center (H2:K33);
conditions: cluster centers must have values ranging from 0 to 1.

It is recommended to have a “Solution Finder” and a hammer. We set the task of “Searching for a solution”: minimize A36 by changing the values of H2:K33 with the condition H2:K33<=1, как и все векторы сделок. Убедитесь, что переменные отмечены как положительные и выбран эволюционный алгоритм (рис. 17).

Rice. 17.“Solution Search” settings for 4-center clustering

But setting a problem is not everything. You will have to sweat a little, selecting the necessary options for the evolutionary algorithm by clicking the “Options” button in the “Solution Search” window and going to the settings window. I advise you to set the maximum time to 30 seconds more, depending on how long you are willing to wait for the “Solution Finder” to cope with its task. In Fig. 18 I set mine to 600 seconds (10 minutes). This way I can run Find a Solution and go to lunch. And if you want to abort it early, just press Escape and exit it with the best solution that it managed to find.

Rice. 18. Evolutionary algorithm parameters

Click Run and watch Excel do its thing until the evolutionary algorithm converges.

The meaning of the results obtained

Once Solver gives you the optimal cluster centers, the fun begins. Let's move on to studying groups! In Fig. In Figure 19, we see that Solver found the optimal total distance of 140.7, and all four cluster centers - thanks to conditional formatting! - look completely different.

Rice. 19. Four optimal cluster centers

Keep in mind that your cluster centers may differ from those presented in the book because the evolutionary algorithm uses random numbers and the answer is different each time. The clusters may be completely different or, more likely, in a different order (for example, my cluster 1 may be very close to your cluster 4, etc.).

Since when creating the sheet you inserted transaction descriptions into columns B through G, you can now read the details in Fig. 19, which is important for understanding the idea of cluster centers.

For cluster 1, in column H, the conditional formatting selects trades 24, 26, 17, and, to a lesser extent, 2. Reading the descriptions of these trades, you can understand what they have in common: they were all made on pinot noir.

Looking at column I, you will see that all green cells have low minimum quantities. These are buyers who do not want to purchase huge quantities during the transaction process.

But the other two cluster centers, frankly speaking, are difficult to interpret. Instead of interpreting cluster centers, how about we study the buyers in the cluster themselves and determine what kind of deals they like? This could clarify the issue.

Rating of transactions using the cluster method

Instead of finding out which distances to which cluster center are closer to 1, let's check who is attached to which cluster and what trades they prefer.

To do this, we'll start by copying the OfferInformation sheet. Let's call the copy 4MC - TopDealsByCluster. Number the columns H through K on this new sheet from 1 to 4 (as in Figure 20).

Rice. 20. Creating a table sheet to calculate deal popularity using clusters

On the 4MC sheet, you had the bindings for clusters 1 to 4 in row 39. All you need to do to count the deals by cluster is look at the names of columns H to K on the 4MC sheet - TopDealsByCluster, see which of sheet 4MC was linked to this cluster in line 39, and then add up the number of their transactions in each line. This way we will get the total number of buyers in this cluster who made transactions.

Let's start with cell H2, which records the number of buyers in cluster 1 who accepted offer number 1, namely the January Malbec. It is necessary to add the values of cells in the range L2: DG2 on sheet 4MC, but only buyers from 1 cluster, which is a classic example of using the sumif / sumif formula. It looks like this:

SUMIF("4MC"!$L$39:$DG$39,"4MC - TopDealsByCluster"! H$1,"4MC"!$L2:$DG2)
=CyMMEOra("4MC"!$L$39:$DG$39,"4MC - TopDealsByCluster"! H$1,"4MC"!$L2:$DG2)

This formula works like this: you supply it with some conditional values, which it checks in the first part "4MC"!$L$39:$DG$39,"4MC, then compares with the 1 in the column header ("4MC - TopDealsByCluster"!H$1 ), and then for each match, adds this value to line 2 in the third part of the formula "4MC"!$L2:$DG2.

Notice that you used absolute references ($ in the formula) before everything related to the cluster association, the row number in the column headers, and the column letter for completed trades. Having made these links absolute, you can drag the formula anywhere from H2:K33 to calculate the number of trades for other cluster centers and combinations of trades, as in Fig. 21. To make these columns more readable, you can also apply conditional formatting to them.

Rice. 21. Total number of transactions for each offer, divided into clusters

By highlighting columns A through K and applying autofiltering, you can sort this data. By sorting column H from smallest to largest, you can see which deals are the most popular in cluster 1 (Figure 22).

Rice. 22. Cluster sort 1. Pino, pinot, pinot!

As I mentioned earlier, the four largest trades for this cluster are pinot. These guys are clearly abusing the movie Sideways. If you sort cluster 2, then it will become absolutely clear to you that these are small wholesale buyers (Fig. 23).

But when you sort cluster 3, it won't be so easy to understand anything. Large transactions can be counted on one hand, and the difference between them and the rest is not so obvious. However, the most popular deals do have something in common - pretty good discounts, 5 of the 6 biggest deals are on sparkling wine, and France is the producer of the product for 3 of 4 of them. However, these assumptions are ambiguous.

As for Cluster 4, these guys clearly liked the August champagne deal for some reason. Also, 5 of the 6 largest transactions are for French wine, and 9 of the top 10 largest transactions are for large volumes of goods. Maybe this is a large wholesale cluster gravitating towards French wines? The intersection of clusters 3 and 4 is also worrying.

Next, we consider the segmentation of students by subjective properties (see subsection 14.1) and by benefits (see subsection 14.4) that obtaining higher education in full-time education provides. For segmentation, a technique is used based on cluster analysis with the use of multidimensional scaling for additional, more complete analysis.

Segmentation Variables– properties and benefits – must have quantitative scores. Nine parameters were used to solve a specific problem. To apply the Likert scale, corresponding statements are formulated for each parameter.

1. This is the best way to gain deep knowledge.
2. This is an opportunity for full communication and making friends.
3. This is a valuable opportunity to interact with the teacher.
4. This is an important step in starting a career.
5. Student life is a wonderful period in life.
6. The material costs of full-time education are high.
7. The time required for full-time education is high.
8. Develops thinking in the specialty.
9. Daytime education is prestigious.

The set of parameters that can be used can be much wider. Students in their questionnaires also often indicate the following advantages or disadvantages of full-time study at the university: the opportunity to broaden their horizons, the possibility of deferment, the opportunity to learn self-discipline and self-organization, the difficulty of combining study and work, an important period in life, lack of practice, the opportunity to obtain a large amount of information, influence for further advancement in work, the possibility in the future to decide on the correct choice of profession, participation in the life of the university.

Data collection

Data collection is carried out using the questionnaire method. The questions are formulated using a Likert scale (see Section 8.3). For example, students were asked about their degree of agreement or disagreement with statements on a five-point scale. The seven-point scale is widely used in the literature, but often the respondent finds it difficult to give answers with a large number of gradations.

A fragment of the questionnaire looks like shown in Fig. 24.2.

Rice. 24.2.

The respondent is only required to put a “tick”, and the digitization is carried out by the questionnaire. A five-point scale with levels from 1 to 5 was used (1 – strongly disagree, ..., 5 – completely agree). 19 respondents answered the questionnaire - all students from the same group, which, of course, is not enough.

24.7. Segmentation by properties using the example of an educational product 381

Calculations using the cluster analysis method

Cluster analysis (see subsection 23.7) is widely used when segmenting by product properties (see subsection 24.3). Segmentation by cluster analysis is sometimes called hierarchical. Based on the obtained grades, the distances between the grades of each student with each one are calculated. Based on the package of scientific statistical programs Statistica. First, a matrix of Euclidean distances is compiled (euclidean distances). To form clusters, a combining (agglomerative) procedure using the far neighbor method was used (complete linkage). The results are presented in the form of a diagram in Fig. 24.3.

Rice. 24.3. Dendrogram (DPP) Statistica)

The vertical axis gives the distance between the attached clusters (Linkage Distance). Students are listed along the horizontal axis with numbers from C_1 to C 19. As follows from the dendrogram, there are 19 clusters at the first step. In the first and second steps, points 3 with 5 and 9 with 11 are combined. In the third step, points 8 and 13 are combined. Then the merging process continues.

When choosing the final step and, accordingly, the number of clusters, we use the agglomeration plan (Fig. 24.4). The final version is taken to be a step after which the distance between the clusters being merged (Linkage Distance) increases sharply.

Rice. 24.4.

Let's choose the result of the partition in accordance with the recommendations from subsection. 23.7. As follows from the agglomeration plan, a relatively sharp increase in the distance between the attached clusters occurs at the 13th and 17th steps (Step in Fig. 24.4). Therefore, a choice must be made between the 12th and 16th steps. To unambiguously select the final step in accordance with the same recommendations from Sect. 23.7 let us turn to multidimensional scaling.

Segmentation results using the multidimensional scaling method

Additionally, to select the final classification option, the picture of the relative positions of points is considered using the multidimensional scaling method in Fig. 24.5, which was obtained as a result of working with the Statistica PPP. There are two dimensions along the axes – Dimension 1 and Dimension 2.

Clusters have a convex shape only at the 16th step of cluster analysis, which can be seen from the results of drawing intergroup boundaries based on multidimensional scaling. These results are accepted as final. Three clusters have been formed, and essentially segments. The first cluster includes nine points, the second – three, the third – seven.

Rice. 24.5.

Characteristics of segments

Segments can be characterized by average values for each variable, and the results of segmentation can be visually presented in the form of profiles for average values for each variable (Fig. 24.6).

To provide a meaningful, laconic description of the segment, it is given a name and motto. A complete description of the cluster follows from its profile. The segment name can be based on the variables that have the highest and lowest scores, as seen by looking at the profiles. Comparing profiles allows you to identify the features of each segment and “position” it against the background of the others.

Let's formulate the name of each received segment and give a motto. First segment – positivists: “Costs are not the main thing”, second – lovers of life. "Think about the present. We

Rice. 24.6.

not here for prestige and career,” the third – purposeful: "The prestige pays for the costs." The following technology was used to obtain the segment name.

Indeed, in accordance with Fig. 24.6:

For first cluster High scores are typical for the attributes (4) “Studenthood is a wonderful period in life” and (8) “Develops thinking in the specialty.” At the same time, the statements (6) “Material costs are high” and (7) “Time costs are high” received low ratings;
second cluster – high scores for the statements (1) “The opportunity to fully communicate and make friends” and (4) “Studenthood is a wonderful period in life.” Low scores were obtained for the statements (3) “An important step in your career” and (9) “Full-time education is prestigious”;
third cluster – high scores for statements (6) “Material costs are high” and (9) “Daytime education is prestigious” with relatively low scores for (4) “Studenthood is a wonderful period in life.”

Here, benefits are conveniently understood as the motives for receiving such an education.
PPP is a package of application programs.
The theory of the method is presented in subsection. 23.6.
For a more familiar profile view, you need to rotate it 90° clockwise.

Market segmentation is a formal procedure based on the application of statistical methods of multivariate analysis to research results. Four main methods can be used to obtain market segments:

1 Traditional methods:

A priori (a priori);

Cluster based.

2 New methods:

Flexible segmentation;

Componential segmentation.

The a priori method of segmenting the consumer market is used when it is possible to put forward a market segmentation hypothesis. To do this, it is necessary to understand the needs, wants, and desires of consumers. Consumer characteristics such as consumption intensity, needs, key elements of motivation and their meanings will act as independent variables, and segmentation variables (age, gender, region, etc.) will be used as dependent variables.

Using this method, the researcher initially puts forward a hypothesis of market segmentation, and then tests it during marketing research.

The a priori method of market segmentation includes seven stages:

1 Selecting a segmentation basis. Analysis of needs, needs and other factors that influence consumer choice.

2 Selection of segmentation variables and development of a market segmentation grid (hypothesis). There is a selection and justification of criteria, variables for segmenting the consumer market, a search for probable connections between the basis and variables, and elimination of contradictions in the market segmentation grid.

3 Sampling.

4 A survey is conducted and quantitative data is collected.

5 Segments are formed based on the breakdown of respondents from among possible buyers into categories.

6 Establishing segment profiles. Market segments are formed and tested for compliance with the hypothesis put forward.

7 Development of marketing strategies for each market segment.

A priori segmentation method is the most used method. This is due to its simplicity, low cost and the availability of techniques that ensure its implementation. However, in practice, situations often arise when it is quite difficult to put forward a market segmentation hypothesis.

The cluster method is similar to the a priori method, but it does not define the dependent variable - it looks for natural clusters. First, respondents from among potential buyers are grouped into market segments using an analytical procedure. Then variables are identified that could be used to define the market segment.

When clustering, natural groups are searched, and when classifying, groups are formed according to artificially specified criteria.

Consumer grouping using the AID method is widespread. When using this method, a system-forming criterion is selected. After this, the sample is divided into subgroups, that is, subgroups with a high value of the system-forming criterion are formed.

The disadvantage of this method is the selection of the market segment. The method is labor-intensive and does not guarantee an exact solution.

Segmentation using the cluster analysis method is carried out in an ascending (bottom-up) manner. At the stage of marketing research, many buyer characteristics are identified. A sample of at least 200 units is required. The results are being processed. The data is considered on a universal scale that determines the severity of the parameter. Then each consumer is examined and the ones that are most similar to each other are determined. Similar consumers are combined into clusters and act as a composite object. Next, the objects that are most similar to each other are searched for and combined into a new cluster. The process ends when similar clusters cannot be identified.

To implement market segmentation using the clustering method, statistical packages such as SPSS and NCSS&PASS can be used in practice.

Flexible market segmentation is a dynamic procedure that involves flexibility in constructing segments based on an analysis of consumer preferences for product alternatives. The conjoint analysis procedure is the basis of flexible segmentation. One of the advantages of this method is that it allows you to fairly accurately determine consumer groups when a new product enters the market. The disadvantages of the flexible segmentation method include high cost, complex implementation procedure, and possible errors at the developer level.

Component analysis of market segmentation is based on sophisticated statistical analysis techniques. It requires large computing resources. The method of component analysis of market segmentation was proposed by P. Green. This method attempts to determine which type of buyers are most suitable for certain product characteristics.

According to Western experts, the method of flexible and component market segmentation is purely academic and inapplicable to real life.

As part of the work on the first chapter of the final qualifying work, theoretical knowledge was obtained in the field of consumer market segmentation. The main features of consumer market segmentation are considered. Methods of market segmentation have been studied.

Romanyuk E. V.

Russia, Stavropol, master's degree from the North Caucasus Federal University

Review of cluster analysis methods and assessment of their applicability for solving the problem of consumer market segmentation

annotation

This paper discusses an article about the process of consumer market segmentation, the definition of a decision support system, as well as the use of cluster analysis in various fields of activity, a common set of cluster analysis methods for solving marketing problems.

Keywords: Segmentation, cluster analysis, Data Mining, decision support. Segmentation, cluster analysis, Data Mining, decision support.

The modern content of the market segmentation process is the result of the evolution of the marketing concept. Before the manufacturer began to consider the market as a differentiated structure depending on consumer groups and consumer properties of the product, his views and consciousness went through various marketing methods: mass, product-differentiated, targeted.

Market segmentation is, on the one hand, a method for finding parts of the market and determining the objects to which the marketing activities of enterprises are directed. On the other hand, it is a management approach to the enterprise’s decision-making process in the market, the basis for choosing the right combination of marketing elements.

The objects of segmentation are, first of all, consumers. Selected in a special way and possessing certain common characteristics, they constitute a market segment. The main focus of marketing is on finding homogeneous groups of consumers who have similar preferences and respond similarly to marketing offers.

For the successful implementation of segmentation principles, the following conditions are met:

– the ability of an enterprise (organization) to differentiate the marketing structure (prices, methods of sales promotion, place of sale, products);

– the selected segment must be sufficiently stable, capacious and have growth prospects;

– the enterprise must have data about the selected segment, measure its characteristics and requirements;

– the selected segment must be accessible to the enterprise, i.e., have appropriate sales and distribution channels, a product delivery system;

– the enterprise must have contact with the segment (for example, through personal and mass communication channels);

– assess the protection of the selected segment from competition, determine the strengths and weaknesses of competitors and their own advantages in competition.

Thus, only after sufficiently studying the selected segment and assessing its own potential, a manufacturer can decide on choosing a segment.

Data Mining is a multidisciplinary field that arose and is developing on the basis of such sciences as applied statistics, pattern recognition, artificial intelligence, database theory, etc.

Data Mining is a decision support process based on searching for hidden patterns in data.

Data Mining is the process of discovering in raw data previously unknown, non-trivial, practically useful and interpretable knowledge necessary for decision-making in various areas of human activity.

Cluster analysis is used in various fields. It is useful when you need to classify a large amount of information.

In marketing, this could be the task of segmenting competitors and consumers. In marketing research, cluster analysis is used quite widely - both in theoretical research and by practicing marketers who solve problems of grouping various objects. At the same time, questions about groups of customers, products, etc. are resolved. Thus, one of the most important tasks when applying cluster analysis in marketing research is the analysis of consumer behavior, namely: grouping consumers into homogeneous classes to obtain the most complete picture of customer behavior from each group and the factors influencing its behavior.

An important task that cluster analysis can solve is positioning, i.e., determining the niche in which a new product offered on the market should be positioned. As a result of applying cluster analysis, a map is constructed from which one can determine the level of competition in various market segments and the corresponding characteristics of the product for the possibility of entering this segment. By analyzing such a map, it is possible to identify new, unoccupied niches in the market in which existing products can be offered or new ones can be developed.

Data Mining is widely used in the field of marketing.

Basic marketing questions “What is sold?”, “How is it sold?”, “Who is the consumer?” The lecture on classification and clustering problems describes in detail the use of cluster analysis to solve marketing problems, such as consumer segmentation.

Another common set of methods for solving marketing problems are methods and algorithms for searching for association rules. The search for temporal patterns is also successfully used here.

In retail trade, as in marketing, the following are used:

– algorithms for searching for association rules (to determine frequently occurring sets of goods that buyers buy at the same time). Identifying such rules helps to place goods on store shelves, develop strategies for purchasing goods and placing them in warehouses, etc.

– use of time sequences, for example, to determine the required volumes of goods in a warehouse.

– classification and clustering methods to identify groups or categories of customers, knowledge of which contributes to the successful promotion of goods.

Literature

Alekseev A. A. “Methodology for segmenting consumers,” // “Marketing and Marketing Research in Russia,” No. 1, 2009.
Basovsky L. E. “Marketing”, Moscow, INFRA M, 2009, – 426 p.
Goltsov A. V. “Prospects for the use of strategic marketing in an enterprise.” // “Marketing”, 2008, No. 2., p. 72-89.
Croft M. D. “Market Segmentation.” St. Petersburg, “Peter”, 2008 – 128 p.
Reznichenko B. A. “Critical analysis of segmentation criteria”, “Marketing in Russia and Abroad”, No. 3, 2009.

Segmentation Methods

Some "basic" segmentation methods can be identified. The most important of them is consumer cluster analysis (taxonomy). Consumer clusters are formed by grouping together those who give similar answers to questions asked. Buyers can be grouped into a cluster if they have similar age, income, habits, etc. Similarity between buyers is based on different measures, but often the weighted square of the differences between buyers' responses to a question is used as a measure of similarity. The output of clustering algorithms can be hierarchical trees or grouping of consumers into groups. There are quite a large number of cluster algorithms.

For example, in the USA, cluster analysis of systems called PRIZM is widespread , which begins clustering by reducing a set of 1000 possible socio-demographic indicators. This system forms socio-demographic segments for the entire US territory. Thus, cluster 28 has been identified - families that fall into this cluster include individuals with the most successful professional or managerial careers. This cluster also reflects high income, education, property, and approximately middle age. Although this cluster represents only 7% of the US population, it is critical for entrepreneurs selling high-value goods.

There are other examples of consumer segmentation based on cluster analysis. For example, among the “psychological” sectors, a very important place is occupied by “the consumer’s attitude to the novelty of the product” (Fig. 3)

Figure 3

As can be seen from the above data, the largest number of consumers are ordinary buyers.

Consumer segmentation based on cluster analysis is a “classical” method. At the same time, there are methods of market segmentation based on the so-called “product segmentation” or market segmentation according to product parameters. It is especially important when releasing and marketing new products. Product segmentation, based on the study of long-term market trends, is of particular importance. The process of developing and producing a new product and completing large investment programs require a fairly long period, and the correctness of the results of market analysis and assessment of its capacity is especially important here. In conditions of working on the traditional market of standard products, calculation of its capacity can be carried out by using the market summation method. In modern conditions, in order to increase its competitiveness and correctly determine the market capacity, it is no longer enough for an enterprise to carry out market segmentation in only one direction - defining consumer groups according to some criteria. As part of integrated marketing, it is also necessary to segment the product itself according to the most important parameters for its promotion on the market. For this purpose, the method of compiling functional maps- carrying out a kind of double segmentation, by product and consumer.

Functional maps can be single-factor (segmentation is carried out according to one factor and for a homogeneous group of products) and multi-factor (analysis of which consumer groups a specific product model is intended for and which of its parameters are most important for promoting products on the market) Using compilation functional maps can be used to determine which market segment a given product is designed for, what functional parameters correspond to certain consumer needs.

When developing new products, this methodology assumes that all factors reflecting the system of consumer preferences, and at the same time the technical parameters of the new product, with which it is possible to satisfy consumer needs, must be taken into account; consumer groups are identified, each with its own set of requests and preferences; all selected factors are ranked in order of importance for each consumer group.

This approach allows you to see already at the development stage which parameters of the product require design improvements, or to determine whether there is a sufficiently capacious market for this model.

Let us give an example of such a market analysis in relation to the Apple computer project under development (Table 1) (see next page)

Table 1." Segmentation of the personal computer market and factors taken into account when developing products for it (1982) "

Factors	Market segments by consumer groups						Model
Factors	At home	At school	At the university	To the house. office	In small business	In a corporation	A	IN
Technical specifications	*	*	***	**	**	**	***	**
Price	***	***	**	***	***	**	0	**
Special qualities	*	*	**	*	*	*	**	*
Reliability	**	*	*	**	**	*	0	**
Convenient to use	**	**	*	**	*	0	***	***
Compatibility	0	0	0	0	0	***	0	0
Peripheral equipment	0	0	0	0	0	***	0	0
Software	*	*	**	**	**	***	*	**

*** is a very important factor

** - important factor

* - unimportant factor

0 - insignificant factor

This simple analysis shows that Model A is a computer without a market, and Model B is the most suitable product for universities and small businesses.

The company once bet on computer A and lost.

In general, in world practice, 2 fundamental approaches to marketing segmentation are used - (see: general scheme of segment analysis (Fig. 4)) (next page)

Within the first method. called “a priory”, the characteristics of segmentation, the number of segments, their number, characteristics, and a map of interests are previously known. That is, it is assumed that segment groups in this method have already been formed. The “a priory” method is used in cases where segmentation is not part of the current research, but serves as an auxiliary basis for solving other marketing problems. Sometimes this method is used when market segments are very clearly defined, when the variability of market segments is not high. “A priory” is also acceptable when forming a new product aimed at a well-known market segment.

Within the second method, called “post hoc (cluster based), the uncertainty of the characteristics of segmentation and the essence of the segments themselves is implied. The researcher first selects a number of variables that are interactive in relation to the respondent (the method involves conducting a survey) and then, depending on the expressed attitude towards a certain group of variables, respondents belong to the corresponding segment. In this case, the map of interests identified in the process of subsequent analysis is considered as secondary. This method is used when segmenting consumer markets, the segment structure of which is not defined in relation to the product being sold.

Segmentation by " a priory "

When choosing the number of segments into which the market should be divided, they are usually guided by the target function - identifying the most promising segment. Obviously, when forming a sample, it is unnecessary to include segments whose purchasing potential is quite small in relation to the product under study. The number of segments, as studies show, should not exceed 10; excess is usually associated with excessive detail of segmentation features and leads to unnecessary “blurring” of features.

For example, when segmenting by income level, it is recommended to divide all potential buyers into segments of equal volume, taking into account that the volume of each segment is at least not less than the estimated volume of sales of services, based on knowledge of the production capacity of the enterprise. The most successful example that explains the above and demonstrates the possibility of dividing potential consumers into stable segment groups can be segmentation of the population based on income, when the entire population is divided into five 20% groups. The presented distribution of income by five 20% population groups is regularly presented in statistical collections and reports, similar to that presented in table. 2

table 2 ."Distribution of income by population groups. %"

The convenience of working with such segment groups is obvious, especially in terms of tracking their capacity.