GitHub Repository Ownership Assumption Caused Commit Sync Failures
How assuming that every repository returned by GitHub belongs to the logged-in user broke commit synchronization, and how proper metadata preservation fixed it.
Outline↓
GitHub Repository Ownership Assumption Caused Commit Sync Failures
Problem
From the user's perspective, they could log in and successfully synchronize their GitHub repositories. However, when the system attempted to pull the commit history for some of these repositories, the commits failed to sync. The application behaved as if those repositories had no commits, or the synchronization process failed completely without clear explanation.
Symptoms
- Successful initial repository synchronization, but subsequent commit sync tasks failed.
- The API worker logs showed
404 Not Founderrors when fetching commits from GitHub. - Missing commit data for collaborator, organization, or forked repositories in the user dashboard.
- GitHub API request errors specifically for repositories not owned directly by the authenticated user.
Root Cause
The issue was caused by a flawed design assumption about repository ownership.
When a user authenticates via GitHub OAuth, the application fetches their repositories using the following endpoint:
GET https://api.github.com/user/repos
According to the GitHub API, this endpoint returns all repositories the authenticated user has access to. This includes:
- Personal repositories (owned by the user)
- Organization repositories (where the user is a member)
- Collaborator repositories (owned by other developers)
- Forked repositories (copied from other users/orgs)
However, when constructing the API endpoint to fetch commits for a repository, the application hardcoded the authenticated user's username as the repository owner:
GET https://api.github.com/repos/{logged_in_username}/{repository_name}/commits
This data-modeling and API-integration assumption is incorrect because the logged-in user does not own all the repositories they have access to.
For example, if the authenticated user subhamoydatta703 has access to an organization repository xyz-hackathon-team/Cheating_Detector, the application incorrectly requested:
GET https://api.github.com/repos/subhamoydatta703/Cheating_Detector/commits
Since subhamoydatta703/Cheating_Detector does not exist (it belongs to xyz-hackathon-team), the GitHub API returned a 404 Not Found error.
Investigation Process
The issue was investigated by tracing the background sync worker execution logs:
- Checked the database records and verified that the repository
Cheating_Detectorhad been successfully saved during the repository sync step. - Isolated the commit synchronization function and observed it throwing
404errors for specific repositories. - Printed the constructed request URLs in the logs, revealing the mismatched path:
/repos/subhamoydatta703/Cheating_Detector/commits. - Realized the flawed assumption: the code assumed database ownership matched the active session user, failing to account for organization and collaborator repositories returned by
GET /user/repos.
Fix Applied
To resolve the issue, the database schema and synchronization logic were modified to preserve the correct repository owner details from the source API.
- Schema Update: Added
owner(the repository owner's username) andfullname(e.g.,owner/repo-name) fields to theRepositorydatabase model. - Repository Sync Update: During the repository list sync, the actual owner's login name was extracted from the GitHub API response and stored in the database.
- Commit Sync Update: Modified the endpoint builder to use the stored
ownerfield rather than the logged-in user's username.
// Before:
const url = `/repos/${loggedInUsername}/${repo.name}/commits`;
// After:
const url = `/repos/${repo.owner}/${repo.name}/commits`;
Step-by-Step Fix Implementation
Below is the step-by-step implementation showing how the schema updates and sync service APIs were rewritten in TypeScript to fix the issue.
Step 1: Update the Database Schema
Add fields to store the repository owner's username and the repository's full namespace (e.g., owner/repo-name).
// filepath: prisma/schema.prisma
// Purpose: Updated Repository model to save ownership details.
model Repository {
id String @id @default(uuid())
githubId String @unique
name String // e.g. "Cheating_Detector"
owner String // e.g. "xyz-hackathon-team"
fullname String // e.g. "xyz-hackathon-team/Cheating_Detector"
userId String // References logged-in user who imported it
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
user User @relation(fields: [userId], references: [id])
}
Step 2: Preserve Ownership Data on Sync
When fetching the list of repositories from the GET /user/repos endpoint, map the owner.login field from the API response to the database record.
// filepath: src/services/repoSyncService.ts
// Purpose: Fetch user repositories and save them with correct ownership details.
import { db } from '@/lib/db';
interface GitHubRepoResponse {
id: number;
name: string;
full_name: string;
owner: {
login: string;
};
}
export async function syncUserRepositories(userId: string, accessToken: string) {
// 1. Fetch repositories the user has access to
const response = await fetch('https://api.github.com/user/repos', {
headers: {
Authorization: `Bearer ${accessToken}`,
Accept: 'application/vnd.github+json',
},
});
if (!response.ok) throw new Error('Failed to fetch repositories');
const reposData: GitHubRepoResponse[] = await response.json();
// 2. Synchronize to database, preserving owner login name
const syncPromises = reposData.map((repo) => {
return db.repository.upsert({
where: {
githubId: repo.id.toString(),
},
update: {
name: repo.name,
owner: repo.owner.login, // Preserving ownership metadata
fullname: repo.full_name,
},
create: {
githubId: repo.id.toString(),
name: repo.name,
owner: repo.owner.login, // Preserving ownership metadata
fullname: repo.full_name,
userId: userId,
},
});
});
await Promise.all(syncPromises);
}
Step 3: Fetch Commits Using Stored Metadata
Modify the commit sync worker to query the endpoint using the stored owner name instead of the session user's username.
// filepath: src/services/commitSyncService.ts
// Purpose: Retrieve commits using stored repository ownership metadata.
import { db } from '@/lib/db';
export async function syncCommitsForRepo(repoId: string, accessToken: string) {
// 1. Load repository metadata from database
const repo = await db.repository.findUnique({
where: { id: repoId },
});
if (!repo) throw new Error('Repository not found in database');
// 2. Query endpoint using correct repository owner and name namespace
const url = `https://api.github.com/repos/${repo.owner}/${repo.name}/commits`;
const response = await fetch(url, {
headers: {
Authorization: `Bearer ${accessToken}`,
Accept: 'application/vnd.github+json',
},
});
if (!response.ok) {
if (response.status === 404) {
console.error(`Repository namespace error for ${repo.fullname}. Check access permissions.`);
}
throw new Error(`Failed to fetch commits from ${url}`);
}
const commits = await response.json();
// 3. Process and save commits locally...
return commits;
}
Why the Fix Works
The GitHub API requires the exact owner namespace of a repository to fetch its commits. By saving the owner of each repository when it is first imported into the database, the application can query /repos/{owner}/{repo_name}/commits with correct parameters, matching the exact namespace of the repository (regardless of whether it is a personal repository, collaborator repository, or organization repository).
Lesson Learned
Never assume that data ownership or the namespace of a resource matches the identity of the user who fetched or accessed it.
External APIs often return resources that the user can access rather than resources the user owns. Always preserve the full namespace and ownership metadata when syncing external records into your local database.
Future Prevention
- Preserve Namespaces: Store complete, unique identifiers (like
ownerandfullnameor the full API URL) when fetching data from external APIs. - Avoid Constructing URLs from Assumptions: Instead of constructing URLs by stitching together unrelated session variables, use the URL templates or metadata returned directly in the API payloads (e.g.,
commits_urlorhtml_urlfields from GitHub responses). - Audit API Permissions vs. Ownership: Carefully review the documentation of third-party endpoints to understand the difference between resource ownership (who owns the resource) and resource access (who can view it).
- Add Validation/Integration Tests: Write integration tests with mock repositories belonging to different owners or organizations to verify endpoint generation logic.
Interview Questions
Why does the GitHub API GET /user/repos endpoint return repositories the user does not own?
The GET /user/repos endpoint is designed to return all repositories that the authenticated user has access to. Access scopes and permissions in modern cloud architectures are multi-tenant and decoupled from resource ownership. A developer might have read or write permissions to repositories owned by organizations, business accounts, or other individual collaborators. Returning only personal repositories would prevent users from integrating with collaborative codebases.
What is the difference between Access Permissions and Resource Ownership?
- Access Permissions: Determine who can view, edit, or interact with a resource (e.g. read/write permissions). This is dynamic and can be revoked.
- Resource Ownership: Defines the namespace or authority under which the resource exists (e.g.
xyz-hackathon-team/Cheating_Detector). The owner namespace is an immutable part of the resource's path identifier.
What are the best practices for designing database schemas that sync external resources?
- Always Persist Authority/Namespace Metadata: Store the source's unique namespacing identifiers (
owner,fullname,namespace) alongside local IDs. - Never Derive Identifiers from User Sessions: Do not use the active session user's attributes (like username or email) to build paths to other entities.
- Use Source-provided URL Templates: Many API responses include absolute URLs (e.g.,
commits_urlorurlfields). Storing and calling these directly avoids manual URL concatenation bugs.
Key Takeaway
Never conflate the accessing user with the resource owner when integrating with third-party APIs; always persist full ownership and namespace metadata alongside the synced resource in your database schema.