LibraryLongest Common Subsequence

Longest Common Subsequence

Learn about Longest Common Subsequence as part of GATE Computer Science - Algorithms and Data Structures

Understanding Longest Common Subsequence (LCS)

The Longest Common Subsequence (LCS) problem is a fundamental concept in computer science, particularly relevant for competitive exams like GATE CS. It involves finding the longest subsequence common to two given sequences. A subsequence is derived from another sequence by deleting zero or more elements without changing the order of the remaining elements.

What is a Subsequence?

Consider a sequence S. A subsequence of S is obtained by deleting zero or more characters from S. The order of the remaining characters must be preserved. For example, if S = "AGGTAB", then "GTAB" is a subsequence of S, but "GATB" is not (because the 'T' appears before the 'A' in the original sequence).

Is "ABC" a subsequence of "AXBYCZ"?

Yes. We can obtain "ABC" by deleting 'X', 'Y', and 'Z' from "AXBYCZ" while maintaining the order of 'A', 'B', and 'C'.

The Longest Common Subsequence (LCS) Problem

Given two sequences, say X and Y, the LCS problem is to find the longest subsequence that is present in both X and Y. For instance, if X = "AGGTAB" and Y = "GXTXAYB", the LCS is "GTAB".

LCS can be solved efficiently using dynamic programming.

The problem exhibits optimal substructure and overlapping subproblems, making it a prime candidate for dynamic programming. We build a solution by solving smaller subproblems.

The core idea behind the dynamic programming approach is to define a function, say LCS(X[1..m], Y[1..n]), which represents the length of the LCS of the first m characters of X and the first n characters of Y. The recurrence relation is as follows:

  1. If the last characters of both sequences match (i.e., X[m] == Y[n]), then the LCS length is 1 plus the LCS of the sequences excluding their last characters: 1 + LCS(X[1..m-1], Y[1..n-1]).
  2. If the last characters do not match (i.e., X[m] != Y[n]), then the LCS length is the maximum of the LCS of X excluding its last character and Y, and the LCS of X and Y excluding its last character: max(LCS(X[1..m-1], Y[1..n]), LCS(X[1..m], Y[1..n-1])).

This recursive definition can be memoized or implemented iteratively using a 2D table (often called a DP table) to store the results of subproblems, avoiding redundant computations.

Dynamic Programming Approach: The DP Table

We can construct a table

code
dp[m+1][n+1]
where
code
dp[i][j]
stores the length of the LCS of
code
X[1..i]
and
code
Y[1..j]
. The table is filled row by row, column by column, based on the recurrence relation. The final answer will be
code
dp[m][n]
.

Let's visualize the DP table construction for X = "ABC" and Y = "ACB".

Initialize a table dp[4][4] with zeros.

When X[i] == Y[j]: dp[i][j] = 1 + dp[i-1][j-1] When X[i] != Y[j]: dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Example walkthrough:

For X = "ABC" (m=3) and Y = "ACB" (n=3):

Initialize dp table (4x4) with 0s.

| | | A | C | B | |---|---|---|---|---| | | 0 | 0 | 0 | 0 | | A | 0 | 1 | 1 | 1 | | B | 0 | 1 | 1 | 2 | | C | 0 | 1 | 2 | 2 |

Explanation:

  • dp[1][1] (A vs A): 1 + dp[0][0] = 1
  • dp[1][2] (A vs AC): max(dp[0][2], dp[1][1]) = max(0, 1) = 1
  • dp[2][2] (AB vs AC): max(dp[1][2], dp[2][1]) = max(1, 1) = 1
  • dp[3][2] (ABC vs AC): 1 + dp[2][1] = 1 + 1 = 2 (since C matches C)
  • dp[3][3] (ABC vs ACB): max(dp[2][3], dp[3][2]) = max(2, 2) = 2

The LCS length is 2. The actual LCS can be traced back from dp[m][n] by following the path that led to the maximum values.

📚

Text-based content

Library pages focus on text content

Applications of LCS

The LCS problem has numerous applications, including:

  • Bioinformatics: Comparing DNA sequences or protein sequences.
  • Version Control Systems: Detecting differences between file versions (e.g.,
    code
    diff
    utility).
  • Plagiarism Detection: Identifying similarities between documents.
  • Data Compression: Finding common patterns in data.

For competitive exams, understanding the time and space complexity is crucial. The DP approach for LCS has a time complexity of O(mn) and a space complexity of O(mn), where m and n are the lengths of the two sequences.

Key Takeaways for GATE CS

Focus on understanding the recurrence relation, how to construct the DP table, and how to trace back the actual LCS. Be prepared for questions that involve calculating the LCS length for given sequences or analyzing the complexity of the algorithm.

Learning Resources

Longest Common Subsequence - GeeksforGeeks(documentation)

A comprehensive explanation of the LCS problem, including the dynamic programming approach, recurrence relation, and C++ implementation.

Dynamic Programming - Longest Common Subsequence (LCS)(video)

A clear video tutorial explaining the LCS problem and its dynamic programming solution with visual aids.

Longest Common Subsequence Problem Explained(documentation)

Provides a step-by-step explanation of the LCS algorithm, including examples and code snippets in Python.

Introduction to Dynamic Programming(video)

Part of a larger algorithms course, this lecture introduces dynamic programming concepts, which are foundational for understanding LCS.

LCS - Algorithms for Competitive Programming(documentation)

A detailed explanation of LCS tailored for competitive programming, covering optimizations and related problems.

Longest Common Subsequence - Tutorial(tutorial)

A straightforward tutorial on the LCS problem, covering its definition, algorithm, and applications.

Algorithms - Dynamic Programming(paper)

Lecture notes from Princeton University covering dynamic programming, including examples relevant to sequence alignment and LCS.

Longest Common Subsequence(wikipedia)

The Wikipedia page provides a broad overview of the LCS problem, its history, variations, and applications.

Dynamic Programming: LCS(video)

Another excellent video resource that breaks down the LCS problem and its DP solution with clear examples.

GATE CS Syllabus - Algorithms and Data Structures(documentation)

The official syllabus for GATE Computer Science and Information Technology, which lists 'Dynamic Programming' and 'Greedy Algorithms' as key topics.