Kodai: A Software Architecture and Implementation for Segmentation

The purpose of this thesis is to design and implement a software architecture for segmentation models to improve revenues for a supermarket. This tool supports analysis of supermarket products and generates results to interpret consumer behavior, to give businesses deeper insights into targeted consumer markets. The software design developed is named as Kodai. Kodai is horizontally reusable and can be adapted across various industries. This software framework allows testing a hypothesis to address the problem of increasing revenues in supermarkets. Kodai has several advantages, such as analyzing and visualizing data, and as a result, businesses can make better decisions. In addition to these advantages, Kodai is open-source, which means any developer can access the code, and develop into client requirements. With the described features, it is better than other similar tools such as Gephi, a free visualization and manipulation tool. The retail industry has grown exponentially, resulting in increasing demand for software tools to analyze consumer behavior. The analysis of consumer behavior helps businesses to stay at the forefront of market competition and provide excellent service. By focusing on consumer purchase behavior, Kodai can perform analyses, meaning it can classify consumers based on variables that capture their behavior. An example is identifying consumers who spend the most amount of money in a supermarket. Segmentation models provide qualitative and quantitative methods to improve service for the customer and revenues for the company. These models can be used in different fields such as finance, education and healthcare. Another important feature of Gephi is its interactive and visual modeling capabilities to help understand consumer behavior. Additionally, the software is reusable and supports the integration of future tools, following key extensibility concepts of software design. This thesis explains the implementation of Kodai as a software architecture through segmentation models using a web-based application that implements software engineering methodology to improve revenues and consumer experience. This tool is developed to facilitate segmentation of consumer data based on purchase behavior with the goal of allowing the user to test a hypothesis to address the problem of increasing revenues in supermarkets. Most importantly, the software is reusable and can be adapted horizontally across various industries.

yED [3] is a powerful diagram editor that can be used to create diagrams from manually exported data for analysis and arrange large data sets by using a simple button.
It provides an extensive class library for analysis, visualization graphs and network diagrams. This is an interdisciplinary project that contributes to the disciplines of software engineering and marketing. Many organizations are obsessed with big data, as leading companies are now able to characterize people simply by observing their behavior. In a market analysis environment, the starting point is frequently the introduction of a web application to analyze different segments from collected data. Market segmentation identifies patterns of differences among groups responding to communications, products, and services [6]. The assumption is that if segments can be identified, described and reached selectively and efficiently, then an organization may increase sales and profits, and im-prove customer experience. Segmentation categorizes people based on a range of variables that allows for analysis of groups of people. The most frequently used variables are drawn from demographics, behaviors, or benefits sought by the customer. Kodai is empirically driven and based on segmentation models specified by the market analyst. The aim is to segment consumers based on purchase behavior and to identify customers who exhibit similar purchase behaviors.
The above is the central basis for all segmentation. Segmentation can be broadly divided into two different classes [6]: a priori segmentation and post hoc segmentation.
A priori segmentation involves selecting certain groups from a population. Predetermined segments are defined by demographics, psychographics or some readily observable behavior such as consumer spending. Post hoc segments intend to identify and classify segments based on actual market investigation and analyses of particular answers to survey questions. Because this software measures only purchase behavior, it can be classified as a priori segmentation.
Our a priori approach is implemented when the consumer data divides the market population into two or more groups. Pre-determined or a priori segmentation involves selecting categories in data based on the business requirements. By selecting certain predetermined groups, we create segments in the dataset. In Kodai, pre-determined segments are selected based on the consumers who have spent the most, revenues of consumers, products, and coupons redeemed.
In Kodai, we do not use Post-hoc segmentation, as it is used for introducing new products in the market. The Post-hoc method tries to identify segments based on actual investigations, particularly using analysis of answers to survey questions intending to predict marketplace responses. Rather than shaping a product to serve existing consumer behavior as in a priori segmentation, posthoc segmentation is useful for introducing new products based on consumers' opinions. The following gives an outline of selecting segments in a dataset: It is a well-known maxim in business that 20% of consumers provide 80% of sales [7]. In reference to this statistics, Kodai's goal is to identify areas of growth. It identifies consumers who are increasing their purchases within a two-year period of business. Kodai accomplishes it by building an index from supermarket data through Elastic Search. Elastic Search is a software tool, that allows Kodai to send queries into built indexes [8]. These indexes provide the efficiency that other products are unable to provide (See Section 1.1) The main work of this thesis is outlined as the following: • Applying software architecture by building a web-based tool for segmentation, focusing on the architecture and software development methodology for the tool that supports the analysis of consumer behavior patterns.
• Implementation of the framework for the web based tool.
• Analysis of the software to test hypotheses regarding consumer buying pattern.
The remainder of this thesis is composed of three chapters • First, we discuss background information regarding segmentation models, pricing and related work in order to achieve maximum comprehension for readers of any background.
• Next, the thesis discusses key parts of the implementation. This portion includes details about the software architecture, the software framework, the software methodology, and the technologies used to support segmentation.
• Finally, we discuss the results generated by the tool -all code is uploaded to GitHub [9], is a repository and internet hosting service. The chapter ends with concluding remarks and thoughts for future research.

CHAPTER 2 2.1 BUSINESS BACKGROUND READING
In this chapter, we explain the basic background material and tools necessary for the reader. Understanding the business requirements will help the reader to gain comprehension of the requirements for Kodai. Next, we discuss a high-level view of business concepts, and tools needed for Kodai. This involves segmentation models, the supermarket dataset, and software process models.
A Segmentation model is an abstract defined model created in Kodai based on patterns of differences in purchasing behavior of consumers. The Supermarket dataset that we used here contains raw data collected from a supermarket [1]. Kodai is applied and tested using this supermarket dataset. This chapter contains an outline of segmentation models as they pertain to some aspects of prices and coupons.

BUSINESS REQUIREMENTS
As this thesis is an interdisciplinary project, it requires expertise in multiple disciplines. Kodai must be able to do the following to satisfy the business requirements: • The software should be able to quickly identify top consumers of a supermarket from any demographic.
• The software should be able to determine the households that spend the most in a supermarket.
• The software should be able to identify coupons with the highest number of consumer redemptions.
• The software should allow the developer and other users to test out hypotheses to improve revenues.
• The software should be able to run on any operating system, including Windows, Mac, Linux.
• The software should be open source and able to handle at least a gigabyte of raw data.
• The software should be extensible to other similar segmentation applications. Figure 1. How marketers define segmentation [2] Segmentation models help us to understand consumer behavior, allowing businesses to improve products and services for consumers.

SEGMENTATION
In Figure 1, marketers define markets and try to understand the value of markets. Next, they determine value propositions, which can be benefits sought out by consumers. These values are delivered and monitored creating an asset base. An asset base refers to the important value created through markets. One example of an asset base could be brand recognition from consumers. For example, brands such as Apple, Microsoft, Google have a strong brand recognition that creates consumer loyalty to specific brand.
Segmentation as a concept did not appear until 1956. The most influential discussion of market segmentation appeared in an article by the former president of the American Marketing Association, Smith, addressing product strategies and the use of their application. In his paper, Wendall Smith rejected the classical economic theory of perfect competition [3]. Perfect competition is defined by market holding to conditions such as perfect information about buyers, well-defined property rights, and profit maximization of sellers. Using variety became the norm of contemporary markets. Wendall Smith said segmentation worked more efficiently than a strategy of maximizing output or simply producing as many products as possible.
Using a differentiation strategy, the manufacturer would try to make something for everybody, without an in-depth study of any particular group within the market.
Meanwhile, Smith compared product differentiation strategies as trying to take a layer of a marketing pie-chart and segmenting it into slices. A truly successful organization must find segments and then create products and services fitting their needs rather than creating consumer needs or demands.
Segmentation is defined as patterns of differences among a group's responses to communications, products, and services. Here we consider responses to purchases in a supermarket and raw data that we assume is static. The key idea is that different groups have different patterns of responses in a supermarket environment. These distinctions are inferred from analyzing the supermarket dataset, thus following a priori, empiricallydriven segmentation. Segmentation is broadly classified into apriori and posthoc model.
We briefly discussed these two models of segmentation in Section 1.1.

PRICE
In this chapter, we give a brief outline of price and sample price as defined in the supermarket dataset used by Kodai. In the book Pricing and Revenue Optimization [4], a historical example demonstrates the importance of price in the business. During the rule of the Dutch Republic, speculative bubbles such as "Tulipomania" caused prices of tulips to rise more than a hundredfold within 18 months. This begs the question, "What were prices exactly? And how are they defined?" We define price not as intrinsic but rather based on what consumers perceive: supply and demand. Milton Friedman in his book, Price Theory, defines prices as not determined by any one individual firm, but rather is determined by the market [5]. The supermarket dataset used by Kodai does not contain the price of products, but we can easily configure Kodai to include sample price in the dataset to determine potential impact of price change on revenues. Revenues are incomes from all units sold in the dataset.
In the dataset, we target the top 20% of consumers for increasing revenues; by increasing the prices of products that have been sold the most, we are able to gauge potential revenue increase. We proceed as follows; we define a sample price for items in our dataset. We define a sample price increase variable in our software Kodai for items in supermarket. The number of items sold in the dataset is listed under quantity. We have sample revenue, and sample increased revenue. We can then calculate potential revenue increase.

COUPON
In marketing, a coupon is an incentive or ticket that consumers can use to get financial discounts for purchasing a product. Coupons are part of sales promotions. Coupons are likely to be redeemed by price sensitive consumers, thus software using this data can segment price-sensitive consumers. We assume that buyers, who collect coupons, are more price sensitive than buyers who do not collect coupons. Therefore, from our hypothesis of price sensitive consumers, it follows that consumers who do not collect coupons would not be affected by a small increase in prices. In Kodai, we find products that have been redeemed the most, and details about consumers who have redeemed coupons.

SOFTWARE ENGINEERING BACKGROUND READING
This chapter contains a basic outline of the software engineering process and tools involved in the development of Kodai. We begin with an outline of the supermarket dataset, and the data format used. We also describe the software used to develop Kodai including the programming languages and applications as well as the software framework used. Also, Kodai is compared to Gephi, a visualization and manipulation software to help the reader understand, how Kodai is better and meets both software and business requirements (See Section 2.2).

SOFTWARE REQUIREMENTS
Software requirements are computing pre-requisites needed for the software to run on any computer and produce needed service to the customer or user. Kodai must be able to do the following to satisfy the computer science requirements: • The software must be reusable and applicable across many industries. This is consistent with the principles of software engineering.
• The software must be extensible for future add-ons.
• The software should import data and analyze data according to the business requirements. The section below explains our specific dataset used for Kodai to function.

SUPERMARKET DATASET
In this section, we give a thorough explanation of the supermarket dataset and describe necessary details for the reader to understand it. First, Segmentation is created from raw datasets. We define data as a collection of values of qualitative or quantitative variables, which is measured, collected, analyzed and visualized using graphs, images or other analysis tools. In the supermarket dataset, data is information about habits of consumers based on demographics, revenues, coupons. This dataset contains supermarket household level transactions.
The below dataset represents household level transactions which were collected from over two years from 2,500 households. It contains details of household purchases such as unique id, age category, homeowner, household size. Coupon data provides information about specific coupon campaigns sent to households. The data is in CSV or comma separated files (see section 3.4). These files contain the following information described below: Figure 2 shows the files and their attributes and Table 1 lists the file names. • The CAMPAIGN_TABLE file contains information about 1,584 households that received 30 campaigns via mail. A campaign occurs when the business owner de- • The TRANSACTION_Data file contains all products purchased by households within this study. It has a household key, basket ID (which identifies purchase occasion), day, product identification, quantity, sales value, and coupon match. All information pertaining to transactions are contained in this category.
• The PRODUCT file contains information on each product such as product identification, department, manufacturer, brand, and current product size.
• The CAUSAL_DATA file contains information about products that were displayed in a weekly mail or in-store display. All the above tables are organized and stored in comma separated file format.

CSV
Digital data is commonly stored in Comma Separated File (CSV). It stores tabular data in plain text [2]. Each line of the file is a data record separated by commas. All records have the same number of fields in the same order. Because most data processing ap-  Table 2. Comma Separated Value data format [2] The data shown in Table 2 above can be represented in comma separated value format as shown in Table 3 below.
Kodai takes CSV files as an input and the software process model helps the developer to take this file as an input, and process the data using Kodai.

SOFTWARE PROCESS MODEL
Every software needs thorough planning, and detailed design before a developer begins to code. Without a plan, developers cannot know the direction, and goal of their software. A software process models helps developers to systematize and plan their software development process. Software process model is an abstract representation of a software development process. It presents a description of the process from some particular perspective. We integrated two software models in developing this software, the wa-  The Spiral software development model is a type of software process model. It starts with the following phases of development: identification of business requirements, design phase (involves architectural design), construction or building (involves production of actual software) and finally evaluation and risk analysis. Also, each step in the spiral process can be revisited, repeated to examine risks at each stage. It is perhaps too cumbersome for small software products, and is suited for medium to large scale software products. As Kodai is a small software framework, the spiral software process model was not chosen. Figure 3. Phases in Hybrid Agile Waterfall model [3] In the Hybrid Agile waterfall model, there's an iterative step at every step of waterfall methodology. We start with basic requirement identification; in Kodai, we identified that some of our product requirements need a framework to allow access to supermarket data;

HYBRID -AGILE WATERFALL MODEL
the user interface especially needs to be a web-based application that can be on a cloud.
Next, the initial prototype consists of the development of basic requirements mentioned above in section 2.2. This includes user interfaces, with high-level functions such as an ability to view on a web browser, reusing software for different disciplines. In Kodai, we enable a given supermarket business to collect, analyze and better understand their consumers, as a result of capabilities of the software. Finally, revision and enhancing prototype feedback focus on reviewing comments to incorporate features into our new prototype, this includes functions to view the trend of consumer visits in a supermarket. Agile is ideal for new technology such as web application and flexibility for changes.
By using both the Waterfall and Agile methodology, we involve users of Kodai in the production process even before implementation. As the working model is displayed, the user gets a better understanding of the system being developed. Also, the waterfall models help to increase quality by catching possible design flaws at the testing stage.

GITHUB
After deciding on a software methodology, all software requires iterative updates and versions as software development takes place. In order to document and store all our code for Kodai, we need to use version control and a code hosting platform. We use Github for this purpose. GitHub [4] is a code-hosting platform for version control and collaboration. It lets software developers work together on projects from anywhere. A repository is usually used to organize a single project and contains folders, files, spreadsheets, and datasets. Kodai has all the necessary files stored in GitHub. The link to the software repository: https://github.com/ludwigwittgenstein2/supermarket_elasticsearch.
We use GitHub to maintain version control, future development of Kodai, and a Web framework for developers to build efficient software.

WEB FRAMEWORK
In this section, we explain about Web framework, and the specific framework, Django used for Kodai to be developed. This allows the reader to gain sufficient knowledge in Web frameworks and Django. A Web framework is an implemented extensible abstraction of software development tools required for developers to build web applications. A web framework encapsulates developers' experience from over twenty years [5]. It helps to support the development of web applications that are geared towards applications used by clients on the internet. These frameworks make it easier to reuse common HTTP operations and structure so that other developers with knowledge of a framework can quickly build and maintain the application. The common operations that can be performed with Django framework are session storage, database manipulation, security, URL routing, accessing JavaScript object notion. Session Storage and retrieval help developers store information about users' browsing activity, and later retrieve to help them identify unique users. Database manipulation helps developers to constantly update and remove data stored in a database. Security against cross-site request forgery helps to prevent common attacks to gain access to username and passwords within the web-application. URL routing lets developers quickly categorize URLs within Django. Django gives developers access to the above commonly performed operations.

DJANGO
In Django's documentation, the authors define, "Django as an open-source, highlevel Python Web framework that encourages rapid development and clean, pragmatic design" [6]. Django allows us to create complex database-driven websites emphasizing reusability, plug ability of components, rapid development, and prevention of unnecessary repetition. Django is designed to help developers take an application from concept to Templates return HTML pages. After performing any requested tasks, the view returns an HTTP response object to the web browser Flask is a micro-framework for Python [8]. It can be used to develop a web-based application, but it does not support Elastic Search. Django has the most active community, compared to Flask with more than 80,000 developers with blogs. The service provides a full-featured Model-View-Controller framework and could ostensibly even be used to make an extensible application. Django's REST framework generates pages to browse Figure 4. Architecture describing overall framework of Django [7] and execute all APIs. Thus, we can execute GETs and POSTs quickly and test it in the browser. Thus, we choose to use Django because it meets our requirements and allows us to use Elastic Search. It is written in the Python programming language.

PYTHON
Python is a high-level programming language that emphasizes code readability, meaning the syntax is closer to written English. Python also has a large and comprehensive standard library which includes an extensive documentation [9]. Python was chosen to develop Kodai because it complements the use of Django and Elastic Search. In Kodai, Python supports importing Django and Elastic Search library. Due to these reasons, Python is a more prudent choice than other programming languages such as Java, C++.

ELASTIC SEARCH
In the business requirements from Section 2.2, we understand that our software must be able to quickly find top consumers in a supermarket. In order to achieve this requirement, Kodai need tools to send queries to accomplish it. We use Elastic Search to transform raw data from supermarket to build as an index. Elastic search is a distributed search engine with a RESTful API. A distributed search engine has no central server, and query is distributed among several connected computers over a network. "RESTful API is a service that supports HTTP methods, to create, retrieve, update and delete access to service's resources" [10]. It is used by developers to access information from a web-application. This information might include web application's usage statistics, clicks, the number of users.
Elastic search indexes raw data, and lets us perform queries and combine many types of searches, thus we use it to analyze our data to explore trends and patterns in our data. The software is distributed, which means that indices are divided into shards. A shard is a hor-izontal partition of data in a search engine. Related data is often stored in the same index.
Below is an outline of Elastic Search's architecture: Figure 5. Architecture of Elastic Search Data provider [11] In Figure  indexes are stored set of information about data. Multiple indexes allow the developer to send real-time queries about the data. Document orientation refers to a top-down level that is stored as a JavaScript Object Notation in a unique ID; allowing Python to access our software. By using Elastic Search, we are able to build 1-gigabyte comma separated file of raw supermarket data into an easily accessible index. We use Elastic Search be-cause it supports the Python language to send queries. In order for us to see sample results of Elastic Search, we use the Elastic Search Head Plugin.

ELASTIC SEARCH HEAD PLUGIN
The Elastic Search head is a web frontend for browsing and interacting with indexed data [12], featuring major operations such as: We use Elastic search to index and view the data. Our main goal is to extract actionable knowledge. It helps to explore data in a short time and is capable of scaling petabytes of structured and unstructured data. By indexing, we can quickly access our data at high speed, without the need to create a database.

WHY KODAI IS BETTER
In this section, we first describe Gephi, a visualization software. Next, we compare Gephi to Kodai against the business requirements given above.
Gephi is a visualization and manipulation software [13]. It is a tool for data analysts and scientists keen to explore and understand graphs. In addition to understanding graphs, Gephi is similar to Adobe Photoshop -but for graph data -the user can interact with representations, manipulate structures, and colors, as well as reveal hidden patterns. It can be used for exploratory data analysis and visual analytics.
In Gephi, the task of uploading data and manipulating data depends on the installation of the software. In the above Figure 6, Gephi is out of Memory while uploading transaction file. It cannot process SQL queries to manipulate data. Kodai is open-source and can run on a server without installation into a specific system. Let's consider TRANSACTION_DATA file from Figure 2; to upload this file to GEPHI, the average A developer can use the same file, and index all files in Kodai using an elastic search, as well as manipulate transaction data and view the results. Gephi would not be able to meet the requirements to find top consumers in supermarket data.

FLEXIBILITY
Gephi has three panes, overview, data laboratory and previews to provide an overview of data laboratory data and previews to display the results. In order for developers to take features in Gephi and implement them according to their respective requirements, they would need to uninstall the software and change features within Gephi.
In Kodai, we can create our own flexible view of results by implementing different queries using Elastic Search without going through the process of the uninstallation of software. Thus, Kodai has more flexibility than Gephi.

MEMORY
Gephi depends on a local computer's random access memory to run the software.
To upload a comma separated file of transaction data from supermarket data into Gephi, it takes an average time of 10 minutes to import the file than an average time of 3 minutes in Kodai. With Kodai, we can store data as an index that makes queries faster to access and visualize according to the requirements. Finally, we are also able to use Kodai through Amazon Cloud or other servers.

REUSABLE
Gephi can be used for a variety of different data but is not reusable for a specific purpose such as calculating top revenues, top products, and top coupons renewed. With our Supermarket data, Kodai able to calculate top revenues, top products purchased, and top coupons renewed. Therefore, Kodai reusable to meet both business and software requirements. But Gephi is not able to meet these requirements.

Gephi is constrained by a limitation on large files. Kodai could be horizontally scalable by deploying an Amazon Cloud server and creating clusters within Elastic
Search, which is part of our software. Due to Gephi's failure to satisfy these requirements, Kodai provides a more suitable framework for data analysis, and visualization to researchers, and businesses.

CHAPTER 4 ARCHITECTURE OF KODAI
In this section, we describe the architecture of Kodai. We begin with the general description of software architecture, then we show differences between an architecture and framework. Next, we describe the behavioral aspects of Kodai. Finally, we show top coupons, one feature of Kodai, to understand the implementation.

DIFFERENCE BETWEEN ARCHITECTURE AND FRAMEWORK
Software architecture refers to the structure of software solutions needed to solve technical and operational problems. In Kodai, architecture refers to the guiding principles and code components for applying segmentation models in improving revenues. The goal of a software architecture is to build a bridge between business requirements and technical requirements. In software architectures, the structure of the system is exposed, but implementation is hidden. Architectures are built to support change in the design, Kodai is built to support future changes, reusability and development of new features [1].
Software framework refers to a set of software libraries, to address a general domain purpose such a web application. A framework is an extensible implementation that can be used to solve problems as we build an application or system. In Kodai, Django is a webapplication framework to implement web based tools, and applications.
Most complex systems need a solid foundation, likewise, Kodai requires solid foundation in software architecture. Failing to consider software architecture will likely create unstable software in the long-run. A Software Architecture allows extension of software to multiple other domains. It represents an abstraction of a system, that allows mutual understanding, negotiation, communication among software stakeholders. An architecture is transferable, and reusable for future implementations [2].  And, instead of sample revenue increase, we might implement it as yield per increase of crops.

A BEHAVIORAL DESCRIPTION OF KODAI:
In this section, we describe the behavioral activity within Kodai. A behavioral activity describes orchestrated, repeatable pattern of process in a software system. In the below figure, Kodai's activity diagram explains the flow of control through the structure of system. Kodai [3]. In the first step, the user opens the web-application of Kodai, and clicks Create Segments. This sends a request to Django framework, within the django framework, Django handles and matches url requests. Once Django matches url requests, it generates an authorization request and sends it to Elastic Search. In Elastic Search, the request is sent to an index, and it pulls dictionary data from the index. Now, this is returned back to Django views, and displayed in the browser.

KODAI -TOP COUPONS REDEEMED
We build the implementation of the Top Coupons Redeemed feature through Kodai. In the dataset, we have coupons redeemed by consumers. A coupon is an advertisement that entitles certain benefit to the consumers when they purchase their products and redeem the coupon.
'TOP COUPONS REDEEMED' displays the top coupons redeemed by consumers in the data. This allows businesses to identify their most valuable coupon, products and to target consumers to improve revenues. We require tools such as Elastic Search and Django to run locally on our server for this feature.
As we described in Section 1.1, we use a priori segmentation based on business requirements. An a-priori segment is defined by assuming pre-conceived categorizes before looking the data. In this instance, Top Coupons Redeemed, we use usage rates and occasions of coupons in the data.
We begin by explaining an outline of how this function works.

IMPLEMENTATION OF KODAI
In the last chapter we described how, Kodai is a software architecture for segmentation models. In this chapter, we describe Kodai's implementations through software frameworks and tools. We begin by explaining software implementation; software development model and details of an example component in the system from an implementation perspective. At the end, we describe the process to achieve results. Software implementation explains implementation details of Kodai. It allows the software developers to evaluate the tools necessary to develop a web-application for the business requirements.
Software methodology provides the guidelines for developers to successfully implement the requirements for software development. We use hybrid waterfall agile methodology to help us develop the software as explained in Section 3.5.1 and Section 3.5.2. While the implementation involves many components, we chose to explain only the 'Top Consumer' feature of Kodai, as the other components of the software were developed using a similar methodology.

SOFTWARE IMPLEMENTATION
We explain the implementation of Kodai through a high-level architecture. A high level architecture is a description of the structure of the Kodai Software [1]. It specifies how we will develop the important components from the business requirements.

SOFTWARE FLOW DIAGRAM
The software flow diagram represents activity of actions in the software. It is a dynamic outline of activities contained in the software. It helps us to develop the important activities that the software needs to perform the business requirements. In addition to meeting the requirements, it allows the developer to understand the flow of activities in the software. Below is a high-level software flow diagram.
The flow diagram begins with uploading raw data (comma-separated-files) and also constructing an index to an Elastic Search index. This index data can be viewed with an elastic search head-plugin. The software developer can upload large volume of raw data and

SOFTWARE ENGINEERING MODEL
In the below diagram, we describe hybrid agile waterfall steps used to develop Kodai.
The software engineering model provides the steps necessary for developers to implement this software. In order for the architecture to meet the requirements, we made constant iterations. We used this process in building similar features of the software. In developing, the whole software architecture for Kodai, first we built the, "Top Users by Revenue feature" as a prototype and then iterated through various prototypes. Below we describe the steps we used to develop the features of Kodai that help the user to analysis revenues using segmentation.

REQUIREMENTS
In the requirement phase, we collected important business requirements to develop Kodai. As we explained in the chapter above in 2.2, The important requirements for the software are the ability to import raw data, to view data and to allow the software developer to view different segments of the data.

IMPLEMENTATION
In this section, we explain one feature of Kodai developed to meet the criteria of business requirements. The developed feature allows the software to display the top consumers in supermarket data. Kodai acts as an open-source data analytics framework tool that lets users from various disciples to view and understand their data. The Data Analytics framework gives users a platform to easily transform raw data into understandable categories according to their requirements. As this software is open-source, any software developer can reuse the framework and add additional features.

5.3.2.a TOP USERS BY REVENUE
As discussed in the beginning of this chapter, we here explain the implementation of the Top Consumers by Revenue feature. Since the other features are implemented similarly, this serves as a representative example. Top Consumers by Revenue displays the gross revenue generated by each consumer. This allows businesses to identify their most valuable customers to target their main profit base.
The user will need to install Django 1.9.1 and Elastic Search 2.4.4, refer Appendix for installation of Django [2]. We developed a Python script that builds an index. For example, if we wanted to build an index for a comma-separated file such as 'transaction.csv,' we would store this file as a variable within the script. The python script would access the raw data [3], sort it, and store it using the Elastic Search tool in an easily searchable format.
In the Python script, we import transaction comma-separated file in our raw data and store it as a variable to build an index. In order to view and understand this index, we use Elastic Search head-plugin [4]. Once we have built an index from the above files, using Python we send this SQL query to the Elastic Search Index and store it as a variable. Elastic Search receives this query from Python and sends back results in a dictionary.

Elastic Search Dictionary
In above example of Elastic Search Dictionary, we can see how Elastic search builds a dictionary from an index. Each dictionary has key, and value pair. In order to access this dictionary, we define the key that we need for our feature in a Python script. For example: In Top Visits, we define in our for loop, the name of dictionary, 'household_key', to access household keys from products.

for product in products['aggregations']['household_key']['buckets']
After Elastic Search has received these results in the form of a dictionary, the same Python script allows us to access the values and keys. After this, the results are passed to the Django template file, which renders the results in an HTML file.

USAGE AND APPLICATION
Our tool is simple, and easy for developers to understand and apply for various needs. It can be used in education, agriculture, real estate, financial and retail industries.

EDUCATION SOFTWARE
While our Software is designed to import supermarket data and allow the developer to experiment to improve revenues, it is not limited to this specific purpose. For example, it also could aid educational institutions in the admissions process. In an interview with Cynthia Bonn, the dean of Higher Education for Admission at the University of Rhode Island, the following functions of software were determined as reusable in her work: • Finding Likely Students who would be admitted to URI • Enabling the dean of admissions to have a software framework to work on student data collected from high schools • Storing the collected data to track trends in admissions

AGRICULTURE SOFTWARE
In addition to education, our software would allow farmers to improve crop yield and overall agricultural efficiency. The following functions of the software could be used to improve productivity by: • Providing a software framework for accessing crop data • Storing the collected crop data to track trends • Enabling the farmer to experiment with strategies to improve farming revenues

FINANCE SOFTWARE
In banking, our software could allow bankers to safely store the clients' usage data, and improve public relations with clients. This can be done through: • Accessing client activity • Providing a software framework to quickly access client data • Enabling the bankers to track client behavior • Enabling the bankers to experiment with strategies to improve banking revenues

SOFTWARE REUSE
Although the software was initially intended for use with analyzing supermarket data, it is a highly reusable software with significant applications in agriculture, education, and finance. The software can be used in tracking school admissions, crop production, and banking client trends, in addition to its primary purpose in supermarkets. Therefore, the software crosses domains and is horizontally reusable. As such, it meets the criteria for software reusability.

RESULTS
As we proposed, we developed Kodai as a software architecture for segmentation models to improve revenues in a supermarket applied. This allows users to upload data, and analyze methods to experiment with revenue increasing strategies. This allows the business to effectively analyze and access data through our software. We analyzed supermarket data, found top consumers, top units of products and top coupons, and hypothesized how to improve revenues with Kodai.
In addition to methods to experiment with revenue increasing strategies, our software is reusable, meeting the criteria of business requirements [1].

IMPROVEMENT OF REVENUES
The following hypothetical example helps to show how Kodai could allow for experiments to explore increasing revenue. Kodai allows a developer to easily test out this hypothetical example, by adding an additional column. It takes approximately 5 minutes to add this column to test out this hypothetical example. This is implemented in a similar way explained in 3.3.2.a 'Top Users by Revenue.' Due to lack of price in our dataset, we give a sample price of $2 for the top twenty items. We begin to add additional columns in the software that would allow experiment revenue increase. Next, we hypothesize an increase of 10 cents in each of the items. We select one product from top products by units sold feature.
Let us assume the cost of each pound of bananas is $2. In our dataset, we assign the value 2 to the variable SAMPLE_PRICE for bananas. The number of bananas sold in the dataset is 29760, and the cost of each pound of bananas that we assigned is $2, therefore the total sample revenue of the banana is $59,520. Now, we increase the sample price of each pound of bananas by 10 cents in our data set. We assigned the value 2.10 to the sample banana price. Therefore, the total sample revenue of the banana after increasing 10 cents is $62,496.
However, if we collect purchase data over time, it might indicate increasing the cost of bananas might decrease the volume of sales. This is referred to as price elasticity.
Price elasticity is the measure of relationship between change in quantity to change in price. If the developer is able to upload subsequent measurements of quantity and price, Kodai is able to measure price elasticity. Hence, our software is able to test out this hypothesis of how to improve revenues through use of Kodai.

SOFTWARE AND ARCHITECTURAL REUSABILITY
As mentioned in the previous section, Kodai is not limited to supermarket data, but it can be reused in the following fields of Education, Finance, Agriculture, new features can be easily developed, and added to our software through Django [2].
In Agriculture, Kodai can be used in analyzing crop data. In order to use Kodai, the farmer has to have crop data. This can be collected through devices such as Lidar sensors, and drones, human observation [3]. In the recent age of big data, we know that there is no lack of data available in crops. However, a platform to understand, and study the data using segmentation models is not available. In Kodai, the objective was revenues, however, in agriculture, the farmer will be able to apply segmentation models to find crop yield instead of revenues. Figure 17 Kodai is able to test and add price increase column In Education, Kodai can be used to find out patterns of student behavior, likelihood of students passing a class. In order to use Kodai, educators ought to have student's data on behavior, and performance. In using Kodai as a platform, educators would be able to apply segmentation models to determine how they might help most students to succeed in their classes, and programs.
In Finance, Kodai can be used as a platform to find out client activity, trends and revenue improvement for banks. If the bank has data on consumers' activity, then it is possible for the bank to use Kodai. Kodai's features such as Top Consumers, Top Products can be implemented to find Top Clients by usage, Top Clients by Revenue. On using Kodai, bankers will be able to apply segmentation models to find trends, and methods to improve revenues.
For each of the applications outlined above the user would need to collect the data to be explored using Kodai, and place it into CSV format.

CONCLUSION
Our work in this thesis helps to show that Kodai is a software architecture for segmentation models to improve revenues for a supermarket. It can test a hypothesis to address the problem of increasing revenues in supermarkets. We achieved this by allowing the user to upload supermarket data within the software framework, using a webbased application that implements software engineering methodology to improve revenues, allowing the developer to complete business requirements to be built into the software. The following tables explain how Kodai met both software and business requirements. This has been shown through our implementation in Chapter 5.

SECURITY
Security is quality or state of being secure. In any exchange of data or transaction, there is a vulnerability for hijacking the transaction. To protect against hijacking of data or transaction, we required various security measures. In our software, uploading, and data analysis can be openly accessed. If Kodai is run on a cloud, there is high security risk of any user accessing it through the internet. In order to prevent such access, we can add secure login and access to our software. This would secure our software from unauthorized access.

USER PROFILE
In order to stop unauthorized access, Kodai could focus on the implementation of user-profiles. A user profile can customize individual data and stores information for each user. A user is any person who uses our software. User profiles helps to individual -ize data and content to each user. This can be implemented further through the Django framework.

MACHINE LEARNING ALGORITHMS
Kodai has given a platform to visualize, analyze, and find details about supermarket data.
Our future work will focus on integrating Machine Learning algorithms into Kodai. This enables a program to learn through experience. A Machine learning algorithm learns by analyzing large amounts of data [2]. This allows the development of systems that can automatically adapt and customize recommendations to individual users. It can learn from data, rather from a pre-determined model. Machine Learning can be classified into supervised and unsupervised learning. In supervised learning, we have a set of input variables (x) and an output variable from which an inferred function is produced to be used on later examples. Unsupervised learning anal-yses sets of label data for patterns. Supervised learning focuses more on classification and regression. In Figure 18, classification techniques would predict categorical responses. In Kodai classification techniques could be applied to predict if a consumer is likely to visit supermarket. Regression involves continuous prediction, changes in weather might be correlated with consumers buying some products more than others. In the event of a snow-storm, there is a greater likelihood that customers will buy milk, bread and snow shovels. Clustering is the most common unsupervised learning. It is used for exploratory data analysis.
In Kodai, future work should focus on supervised learning to predict individualizing coupons to each consumer. Supervised learning can classify output variable based on an input dataset. This can be implemented by adding data science machine learning packages such as the GraphLab. The GraphLab is a Python package that allows developers to implement machine learning algorithms. This would help in building predictive models within Kodai.

DATA VISUALIZATION
In addition to predictive models built through machine learning package, to understand the results of predictive models, Kodai needs future work on data visualization. Kodai includes buying trends built on the Graphos library for Django. However, this could be extended to predictive models on purchases. Figure 19 Kodai Weekly Visit Trend In Figure 19, Kodai shows weekly visit trend of customers using Data Visualization. Data visualization is using graphics to communicate results of an analysis to the user. This can be implemented through data visualization packages such as D3 JavaScript library to help developers produce dynamic, interactive data visualization in web browsers.

DEVELOPING AN INTERFACE TO ANALYZE BASKETS OF PURCHASE
Kodai will be enhanced with data visualization tools to analyze baskets of purchase. This permits the identification of loss leaders. Loss leader is a pricing strategy, where a product is sold at a price below its market, to gain new customers for the product.

CAPTURING PRICE ELASTICITY OVER TIME
As we mentioned in Improvement of Revenues in Section 6.1, Kodai would be able to measure price elasticity. Price elasticity is the measure of relationship between change in quantity purchased to change in price. If the developer uploads subsequent measurements of quantity and price, Kodai will be able to measure price elasticity.

CHAPTER 8 APPENDIX
This chapter explains how to install the software tools necessary to develop a system similar to Kodai.

DJANGO INSTALLATION
The user will need to install Django 1.9.1. In order to run Django, we require the user to follow these steps: a) Open Terminal or Command Line, and run the following:

'Python manage.py runserver 8080'
This command uses the Python language to run the Django server locally at the 8080 port. The user will need to install Elastic Search 2.4.4. We run Elastic Search through the following: Next, we must use Elastic Search to build an index that contains any necessary raw data for the Top Consumer by Revenue feature described in 3.3.2.a. This index stores raw data and makes it searchable. Each entry in the index must be assigned a type, and each type has specific properties.

PYTHON SCRIPT FOR TOP CONSUMERS
Below is the Python script that we developed for Top Consumers: