apisJune 9, 20266 min read

GitHub Repository Ownership Assumption Caused Commit Sync Failures

How assuming that every repository returned by GitHub belongs to the logged-in user broke commit synchronization, and how proper metadata preservation fixed it.

Outline↓

GitHub Repository Ownership Assumption Caused Commit Sync Failures

Problem

From the user's perspective, they could log in and successfully synchronize their GitHub repositories. However, when the system attempted to pull the commit history for some of these repositories, the commits failed to sync. The application behaved as if those repositories had no commits, or the synchronization process failed completely without clear explanation.

Symptoms

Successful initial repository synchronization, but subsequent commit sync tasks failed.
The API worker logs showed 404 Not Found errors when fetching commits from GitHub.
Missing commit data for collaborator, organization, or forked repositories in the user dashboard.
GitHub API request errors specifically for repositories not owned directly by the authenticated user.

Root Cause

The issue was caused by a flawed design assumption about repository ownership.

When a user authenticates via GitHub OAuth, the application fetches their repositories using the following endpoint:

GET https://api.github.com/user/repos

According to the GitHub API, this endpoint returns all repositories the authenticated user has access to. This includes:

Personal repositories (owned by the user)
Organization repositories (where the user is a member)
Collaborator repositories (owned by other developers)
Forked repositories (copied from other users/orgs)

However, when constructing the API endpoint to fetch commits for a repository, the application hardcoded the authenticated user's username as the repository owner:

GET https://api.github.com/repos/{logged_in_username}/{repository_name}/commits

This data-modeling and API-integration assumption is incorrect because the logged-in user does not own all the repositories they have access to.

For example, if the authenticated user subhamoydatta703 has access to an organization repository xyz-hackathon-team/Cheating_Detector, the application incorrectly requested:

GET https://api.github.com/repos/subhamoydatta703/Cheating_Detector/commits

Since subhamoydatta703/Cheating_Detector does not exist (it belongs to xyz-hackathon-team), the GitHub API returned a 404 Not Found error.

Broken Flow (Hardcoded Username)

GET /repos/subhamoydatta703/Cheating_Detector/commits

Assumes logged-in user owns everything

✗ 404 Not Found

→

Fixed Flow (Preserved Owner Metadata)

GET /repos/xyz-hackathon-team/Cheating_Detector/commits

Uses repository owner stored in database

✓ 200 OK

Investigation Process

The issue was investigated by tracing the background sync worker execution logs:

Checked the database records and verified that the repository Cheating_Detector had been successfully saved during the repository sync step.
Isolated the commit synchronization function and observed it throwing 404 errors for specific repositories.
Printed the constructed request URLs in the logs, revealing the mismatched path: /repos/subhamoydatta703/Cheating_Detector/commits.
Realized the flawed assumption: the code assumed database ownership matched the active session user, failing to account for organization and collaborator repositories returned by GET /user/repos.

Fix Applied

To resolve the issue, the database schema and synchronization logic were modified to preserve the correct repository owner details from the source API.

Schema Update: Added owner (the repository owner's username) and fullname (e.g., owner/repo-name) fields to the Repository database model.
Repository Sync Update: During the repository list sync, the actual owner's login name was extracted from the GitHub API response and stored in the database.
Commit Sync Update: Modified the endpoint builder to use the stored owner field rather than the logged-in user's username.

// Before:
const url = `/repos/${loggedInUsername}/${repo.name}/commits`;

// After:
const url = `/repos/${repo.owner}/${repo.name}/commits`;

Step-by-Step Fix Implementation

Below is the step-by-step implementation showing how the schema updates and sync service APIs were rewritten in TypeScript to fix the issue.

Step 1: Update the Database Schema

Add fields to store the repository owner's username and the repository's full namespace (e.g., owner/repo-name).

// filepath: prisma/schema.prisma
// Purpose: Updated Repository model to save ownership details.

model Repository {
  id        String   @id @default(uuid())
  githubId  String   @unique
  name      String   // e.g. "Cheating_Detector"
  owner     String   // e.g. "xyz-hackathon-team"
  fullname  String   // e.g. "xyz-hackathon-team/Cheating_Detector"
  userId    String   // References logged-in user who imported it
  createdAt DateTime @default(now())
  updatedAt DateTime @updatedAt
  user      User     @relation(fields: [userId], references: [id])
}

Step 2: Preserve Ownership Data on Sync

When fetching the list of repositories from the GET /user/repos endpoint, map the owner.login field from the API response to the database record.

// filepath: src/services/repoSyncService.ts
// Purpose: Fetch user repositories and save them with correct ownership details.

import { db } from '@/lib/db';

interface GitHubRepoResponse {
  id: number;
  name: string;
  full_name: string;
  owner: {
    login: string;
  };
}

export async function syncUserRepositories(userId: string, accessToken: string) {
  // 1. Fetch repositories the user has access to
  const response = await fetch('https://api.github.com/user/repos', {
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: 'application/vnd.github+json',
    },
  });

  if (!response.ok) throw new Error('Failed to fetch repositories');
  const reposData: GitHubRepoResponse[] = await response.json();

  // 2. Synchronize to database, preserving owner login name
  const syncPromises = reposData.map((repo) => {
    return db.repository.upsert({
      where: {
        githubId: repo.id.toString(),
      },
      update: {
        name: repo.name,
        owner: repo.owner.login, // Preserving ownership metadata
        fullname: repo.full_name,
      },
      create: {
        githubId: repo.id.toString(),
        name: repo.name,
        owner: repo.owner.login, // Preserving ownership metadata
        fullname: repo.full_name,
        userId: userId,
      },
    });
  });

  await Promise.all(syncPromises);
}

Step 3: Fetch Commits Using Stored Metadata

Modify the commit sync worker to query the endpoint using the stored owner name instead of the session user's username.

// filepath: src/services/commitSyncService.ts
// Purpose: Retrieve commits using stored repository ownership metadata.

import { db } from '@/lib/db';

export async function syncCommitsForRepo(repoId: string, accessToken: string) {
  // 1. Load repository metadata from database
  const repo = await db.repository.findUnique({
    where: { id: repoId },
  });

  if (!repo) throw new Error('Repository not found in database');

  // 2. Query endpoint using correct repository owner and name namespace
  const url = `https://api.github.com/repos/${repo.owner}/${repo.name}/commits`;

  const response = await fetch(url, {
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: 'application/vnd.github+json',
    },
  });

  if (!response.ok) {
    if (response.status === 404) {
      console.error(`Repository namespace error for ${repo.fullname}. Check access permissions.`);
    }
    throw new Error(`Failed to fetch commits from ${url}`);
  }

  const commits = await response.json();
  // 3. Process and save commits locally...
  return commits;
}

Why the Fix Works

The GitHub API requires the exact owner namespace of a repository to fetch its commits. By saving the owner of each repository when it is first imported into the database, the application can query /repos/{owner}/{repo_name}/commits with correct parameters, matching the exact namespace of the repository (regardless of whether it is a personal repository, collaborator repository, or organization repository).

Lesson Learned

Never assume that data ownership or the namespace of a resource matches the identity of the user who fetched or accessed it.

External APIs often return resources that the user can access rather than resources the user owns. Always preserve the full namespace and ownership metadata when syncing external records into your local database.

Future Prevention

Preserve Namespaces: Store complete, unique identifiers (like owner and fullname or the full API URL) when fetching data from external APIs.
Avoid Constructing URLs from Assumptions: Instead of constructing URLs by stitching together unrelated session variables, use the URL templates or metadata returned directly in the API payloads (e.g., commits_url or html_url fields from GitHub responses).
Audit API Permissions vs. Ownership: Carefully review the documentation of third-party endpoints to understand the difference between resource ownership (who owns the resource) and resource access (who can view it).
Add Validation/Integration Tests: Write integration tests with mock repositories belonging to different owners or organizations to verify endpoint generation logic.

Interview Questions

Why does the GitHub API GET /user/repos endpoint return repositories the user does not own?

The GET /user/repos endpoint is designed to return all repositories that the authenticated user has access to. Access scopes and permissions in modern cloud architectures are multi-tenant and decoupled from resource ownership. A developer might have read or write permissions to repositories owned by organizations, business accounts, or other individual collaborators. Returning only personal repositories would prevent users from integrating with collaborative codebases.

What is the difference between Access Permissions and Resource Ownership?

Access Permissions: Determine who can view, edit, or interact with a resource (e.g. read/write permissions). This is dynamic and can be revoked.
Resource Ownership: Defines the namespace or authority under which the resource exists (e.g. xyz-hackathon-team/Cheating_Detector). The owner namespace is an immutable part of the resource's path identifier.

What are the best practices for designing database schemas that sync external resources?

Always Persist Authority/Namespace Metadata: Store the source's unique namespacing identifiers (owner, fullname, namespace) alongside local IDs.
Never Derive Identifiers from User Sessions: Do not use the active session user's attributes (like username or email) to build paths to other entities.
Use Source-provided URL Templates: Many API responses include absolute URLs (e.g., commits_url or url fields). Storing and calling these directly avoids manual URL concatenation bugs.

Key Takeaway

Never conflate the accessing user with the resource owner when integrating with third-party APIs; always persist full ownership and namespace metadata alongside the synced resource in your database schema.

APIs GitHub Backend Data-Modeling

GitHub Repository Ownership Assumption Caused Commit Sync Failures

GitHub Repository Ownership Assumption Caused Commit Sync Failures

Problem

Symptoms

Root Cause

Investigation Process

Fix Applied

Step-by-Step Fix Implementation

Step 1: Update the Database Schema

Step 2: Preserve Ownership Data on Sync

Step 3: Fetch Commits Using Stored Metadata

Why the Fix Works

Lesson Learned

Future Prevention

Interview Questions

Why does the GitHub API GET /user/repos endpoint return repositories the user does not own?

What is the difference between Access Permissions and Resource Ownership?

What are the best practices for designing database schemas that sync external resources?

Key Takeaway

Keep Learning

GitHub OAuth Integration

HTTP Middleware

JSON Web Token (JWT) Authentication