LibrarySets: Creation, operations

Sets: Creation, operations

Learn about Sets: Creation, operations as part of Python Mastery for Data Science and AI Development

Python Sets: Unordered, Unique Collections

Sets are a fundamental data structure in Python, offering a powerful way to manage collections of unique items. Unlike lists or tuples, sets do not maintain any specific order, and each element within a set must be unique. This makes them ideal for tasks involving membership testing, removing duplicates, and performing mathematical set operations like union, intersection, and difference.

Creating Sets

You can create sets in Python using curly braces

code
{}
or the
code
set()
constructor. When using curly braces, ensure there are no duplicate elements, as they will be automatically removed. The
code
set()
constructor can take an iterable (like a list or tuple) as an argument.

What are the two primary ways to create a set in Python?

Using curly braces {} or the set() constructor.

Example of creating sets:

python
500 italic"># Using curly braces
my_set = {1, 2, 3, 4, 5}
400">print(my_set)
500 italic"># With 400">duplicates (duplicates are removed)
duplicate_set = {1, 2, 2, 3, 3, 3}
400">print(duplicate_set)
500 italic"># Using the 400">set() constructor 400">"text-blue-400 font-medium">with a list
list_data = [10, 20, 30, 20, 10]
set_from_list = 400">set(list_data)
400">print(set_from_list)

Key Set Operations

Sets support a rich set of operations that mirror mathematical set theory. These operations are highly efficient for data manipulation, especially in data science contexts.

OperationSyntax (Operator)Syntax (Method)Description
UnionA | BA.union(B)Returns a new set with elements from both sets.
IntersectionA & BA.intersection(B)Returns a new set with elements common to both sets.
DifferenceA - BA.difference(B)Returns a new set with elements in A but not in B.
Symmetric DifferenceA ^ BA.symmetric_difference(B)Returns a new set with elements in either A or B, but not both.

Modifying Sets

Sets are mutable, meaning you can add or remove elements. Methods like

code
add()
and
code
update()
are used for adding elements, while
code
remove()
,
code
discard()
, and
code
pop()
are used for removing them.

What is the difference between remove() and discard() when removing an element from a set?

remove() raises a KeyError if the element is not found, while discard() does nothing if the element is not present.

Example of modifying sets:

python
my_set = {1, 2, 3}
500 italic"># Add an element
my_set.400">add(4)
400">print(my_set) 500 italic"># Output: {1, 2, 3, 4}
500 italic"># Add multiple elements 400">"text-blue-400 font-medium">from an iterable
my_set.400">update([5, 6, 3])
400">print(my_set) 500 italic"># Output: {1, 2, 3, 4, 5, 6}
500 italic"># Remove an 400">element (will raise error 400">"text-blue-400 font-medium">if 400">"text-blue-400 font-medium">not present)
my_set.400">remove(5)
400">print(my_set) 500 italic"># Output: {1, 2, 3, 4, 6}
500 italic"># Discard an 400">element (no error 400">"text-blue-400 font-medium">if 400">"text-blue-400 font-medium">not present)
my_set.400">discard(7)
400">print(my_set) 500 italic"># Output: {1, 2, 3, 4, 6}
500 italic"># Remove 400">"text-blue-400 font-medium">and 400">"text-blue-400 font-medium">return an arbitrary element
removed_element = my_set.400">pop()
400">print(removed_element) 500 italic"># e.g., 1
400">print(my_set) 500 italic"># e.g., {2, 3, 4, 6}

Sets for Data Science and AI

In data science and AI, sets are invaluable for tasks such as:

  • Finding unique values: Quickly identify distinct categories or features in a dataset.
  • Data cleaning: Efficiently remove duplicate records or entries.
  • Feature engineering: Creating new features based on set operations (e.g., finding common attributes between two groups of data).
  • Algorithm implementation: Many machine learning algorithms rely on set-theoretic concepts.

Remember that set elements must be immutable (e.g., numbers, strings, tuples). You cannot have mutable types like lists or dictionaries as elements within a set.

Visualizing Set Operations: Imagine two circles representing sets A and B. The union (A | B) is the area covered by both circles. The intersection (A & B) is the overlapping area. The difference (A - B) is the part of circle A that does not overlap with B. The symmetric difference (A ^ B) is the area covered by either circle, but not their overlap.

📚

Text-based content

Library pages focus on text content

Learning Resources

Python Sets - Official Documentation(documentation)

The authoritative source for Python's set data structure, covering creation, methods, and operations.

Python Sets Tutorial - Real Python(tutorial)

A comprehensive and practical guide to Python sets with clear examples and explanations.

Understanding Python Sets - Towards Data Science(blog)

Explores the practical applications of Python sets specifically within the context of data science workflows.

Python Data Structures: Sets - YouTube(video)

A visual explanation of Python sets and their common operations, ideal for visual learners.

Set Operations in Python - GeeksforGeeks(tutorial)

Detailed walkthrough of various set operations with code examples and explanations of their usage.

Python Set Methods - W3Schools(documentation)

A quick reference for all available set methods in Python, including add, remove, union, intersection, etc.

Using Sets in Python for Data Analysis - DataCamp(blog)

Focuses on how sets can be leveraged for efficient data cleaning and analysis tasks in Python.

Python Set Theory - Programiz(tutorial)

Covers the basics of Python sets, including creation, accessing elements, and performing set operations.

Introduction to Python Data Structures - Coursera (Module on Sets)(video)

A segment from a popular course introducing sets and their fundamental properties.

Python Sets - Tutorialspoint(tutorial)

Provides a concise overview of Python sets, their characteristics, and common operations.