Unofficial Apyori Documentation

The post below reflects my unofficial docs for the pip-installable Apyori package (on pypi, on github). I am just a fan of the project and Association Rule Learning generally, so thought I’d write up some notes for the community below. I am in no way associated with the project, and would like the thank ymoch for all their hard work. If you find any inaccuracies below, please leave a comment.

Loading Transactions (API usage)

Transactions should be an interable of iterables (e.g. List of lists). For transactions stored in this format in a variable, apriori() can be called directly on this object. However, if you want to load transactions from files you should use:

from apyori import load_transactions


with open('path_to_file') as f:
    transactions = load_transactions(f)

from apyori import load_transactions

with open('path_to_file') as f:

transactions = load_transactions(f)

The result of the object loaded from the file will be a generator for the transactions. To view the transactions, you can convert to a list:

with open('path_to_file') as f:
    transactions = list(load_transactions(f))

1 2	with open('path_to_file') as f: transactions = list(load_transactions(f))

Note: Avoid using syntax such as load_transaction(‘/path/to/file’). To maintain flexibility to accept path-like objects, such syntax will behave unexpectedly.

Advanced Usage

Under the hood this function is using Python’s built-in csv.reader. Accordingly, load_transaction can accept any kwarg accepted by csv.reader. This is particularly important for the delimiter, as load_transaction’s default delimiter is for tabs only.

# To load from a csv
with open('path_to_file') as f:
    transactions = load_transactions(f, delimiter=",")

# To load from a csv

with open('path_to_file') as f:

transactions = load_transactions(f, delimiter=",")

Apriori (API usage)

Running the Apriori algorithm on your transactions is as simple as:

apriori(transactions)

1	apriori(transactions)

The algorithm has four parameters: min_support (defaults to 0.1), min_confidence (defaults to 0.0), min_lift (defaults to 0.0), and max_length (defaults to None). A realistic parameterization (depending heavily on your data and use case) might look like:

apriori(transactions,
        min_support=0.02,
        min_confidence=0.80,
        min_lift=1.0,
        max_length=None)

apriori(transactions,

min_support=0.02,

min_confidence=0.80,

min_lift=1.0,

max_length=None)

What’s returned is a generator of your results. If your data fits into memory and you’d prefer to interact with it that way, you can create a list from the results. E.g.:

list(apriori(transactions,
             min_support=0.02,
             min_confidence=0.80,
             min_lift=1.0,
             max_length=None))

list(apriori(transactions,

min_support=0.02,

min_confidence=0.80,

min_lift=1.0,

max_length=None))

Full Example

from apyori import apriori, load_transactions

"""
data.csv contains the following data:
>>> [['beer', 'nuts'],
>>>  ['beer', 'cheese']]
"""

with open('data.csv') as f:
    transactions = load_transactions(f, delimiter=",")
    results = list(apriori(transactions, min_confidence=0.8))

print(results)

[RelationRecord(items=frozenset({'beer'}), support=1.0, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'beer'}), confidence=1.0, lift=1.0)]), RelationRecord(items=frozenset({'cheese', 'beer'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset({'cheese'}), items_add=frozenset({'beer'}), confidence=1.0, lift=1.0)]), RelationRecord(items=frozenset({'beer', 'nuts'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset({'nuts'}), items_add=frozenset({'beer'}), confidence=1.0, lift=1.0)])]

from apyori import apriori, load_transactions

"""

data.csv contains the following data:

>>> [['beer', 'nuts'],

>>> ['beer', 'cheese']]

"""

with open('data.csv') as f:

transactions = load_transactions(f, delimiter=",")

results = list(apriori(transactions, min_confidence=0.8))

print(results)

[RelationRecord(items=frozenset({'beer'}), support=1.0, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'beer'}), confidence=1.0, lift=1.0)]), RelationRecord(items=frozenset({'cheese', 'beer'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset({'cheese'}), items_add=frozenset({'beer'}), confidence=1.0, lift=1.0)]), RelationRecord(items=frozenset({'beer', 'nuts'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset({'nuts'}), items_add=frozenset({'beer'}), confidence=1.0, lift=1.0)])]

CLI Usage

The official documentation provides adequate coverage of the CLI usage.

Understanding Apriori Output

Important Note: Before proceeding beyond this point, please make sure you understand how the algorithm works and all of its parameters. I have given a couple of beginner-level presentations on Association Rule Learning, with in-depth explanations of the Apriori algorithm, slides for which can be found here. There are links to additional resources in the presentation.

Looking at the example found in the docs:

from apyori import apriori

transactions = [
    ['beer', 'nuts'],
    ['beer', 'cheese'],
]
results = list(apriori(transactions))

from apyori import apriori

transactions = [

['beer', 'nuts'],

['beer', 'cheese'],

]

results = list(apriori(transactions))

Our results would appear as a list containing multiple entries such as the one that follows:

RelationRecord(items=frozenset({'beer', 'nuts'}),
               support=0.5,
               ordered_statistics=[OrderedStatistic(items_base=frozenset({'beer'}),
                                                    items_add=frozenset({'nuts'}),
                                                    confidence=0.5,
                                                    lift=1.0),
                                   OrderedStatistic(items_base=frozenset({'nuts'}),
                                                    items_add=frozenset({'beer'}),
                                                    confidence=1.0,
                                                    lift=1.0)])

RelationRecord(items=frozenset({'beer', 'nuts'}),

support=0.5,

ordered_statistics=[OrderedStatistic(items_base=frozenset({'beer'}),

items_add=frozenset({'nuts'}),

confidence=0.5,

lift=1.0),

OrderedStatistic(items_base=frozenset({'nuts'}),

items_add=frozenset({'beer'}),

confidence=1.0,

lift=1.0)])

Each RelationRecord reflects all rules associated with a specific itemset (items) that has relevant rules. Support (support ), given that it’s simply a count of appearances of those items together, is the same for any rules involving those items, and so only appears once per RelationRecord. The ordered_statistic reflects a list of all rules that met our min_confidence and min_lift requirements (parameterized when we called apriori() ). Each OrderedStatistic contains the antecedent (items_base) and consequent (items_add) for the rule, as well as the associated confidence and lift .

Loading Transactions (API usage)

Apriori (API usage)

Full Example

CLI Usage

Understanding Apriori Output

Leave a Reply Cancel reply

How to Write a Runescape Auto Clicker with Python, Part I

Crohn’s Cooking: 10 spicy flavors that aren’t from capsicum peppers

How to Write a Runescape Autoclicker with Python, Part II

Unofficial Apyori Documentation

Loading Transactions (API usage)

Apriori (API usage)

Full Example

CLI Usage

Understanding Apriori Output

Quickstart: Packaging in Python

Pandas MultiIndex Tutorial

Leave a Reply Cancel reply

How to Write a Runescape Auto Clicker with Python, Part I

Crohn’s Cooking: 10 spicy flavors that aren’t from capsicum peppers

How to Write a Runescape Autoclicker with Python, Part II