-
-
Notifications
You must be signed in to change notification settings - Fork 2
Scripts to capture issues and repos for testing #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sandy3w
wants to merge
20
commits into
hackforla:mixin
Choose a base branch
from
sandy3w:mixin
base: mixin
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
ad27843
POC works
sandy3w 9557b2b
poc action completed
sandy3w c1400c6
updated path in yml
sandy3w 3e8cce9
updated poc yml path again
sandy3w 11bf599
org fetcher completed with workflow
sandy3w 1670ec3
changed repo fetcher workflow name
sandy3w 5955c9d
added rate handling to org_fetcher, tried debugging based on output f…
sandy3w 8c462ab
Repo Labeling: updated auto commit message to be more intuitive, chan…
sandy3w b7fd2da
added optional repos field
sandy3w c968c88
debug: only check if repo exists if repo is specified
sandy3w 6584eab
actions success: updated so that user field is optional
sandy3w 5355648
test scenerio 2: only user specified (org is mand)
sandy3w 0762b4a
test scenerio 3: only repo specified
sandy3w afcc3cb
debug: ensures code can handle no user case
sandy3w 717a1ab
test scenerio 1: both user and repo specified
sandy3w 48383ff
scenerio 2: retry (only user)
sandy3w 5297d0c
cleaned up repo_fetcher code
sandy3w 4550c5a
test scenerio 3: retry (only repo specified)
sandy3w d11781f
debug: makes sure PR author are in output if assigness dont exist
sandy3w cf2cca9
updated code according to Andrew's suggestions
sandy3w File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
name: Org Contributions Fetcher | ||
|
||
on: | ||
workflow_dispatch: | ||
|
||
jobs: | ||
fetch_contributions: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout repo | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Cache pip | ||
uses: actions/cache@v4 | ||
with: | ||
path: ~/.cache/pip | ||
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} | ||
restore-keys: | | ||
${{ runner.os }}-pip- | ||
|
||
- name: Install dependencies | ||
run: pip install requests | ||
|
||
- name: Check for GH_TOKEN (PAT) | ||
run: | | ||
if [ -z "${{ secrets.PAT }}" ]; then | ||
echo "::error::GH_TOKEN (PAT) is not set" | ||
exit 1 | ||
fi | ||
|
||
- name: Run Org Fetcher | ||
env: | ||
PAT: ${{ secrets.PAT }} | ||
run: python issue_contributor_fetcher/org_fetcher/org_fetcher.py | ||
|
||
- name: Upload CSV artifact | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: org-contributions | ||
path: issue_contributor_fetcher/org_fetcher/org_contr.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
name: Repo Contributions Fetcher | ||
|
||
on: | ||
workflow_dispatch: | ||
|
||
jobs: | ||
fetch_issues_prs: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout Repo | ||
sandy3w marked this conversation as resolved.
Show resolved
Hide resolved
|
||
uses: actions/checkout@v4 | ||
with: | ||
ref: mixin | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Cache pip | ||
uses: actions/cache@v4 | ||
with: | ||
path: ~/.cache/pip | ||
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} | ||
restore-keys: | | ||
${{ runner.os }}-pip- | ||
|
||
- name: Install Python Dependencies | ||
run: pip install requests | ||
|
||
- name: Check for GH_TOKEN (PAT) | ||
run: | | ||
if [ -z "${{ secrets.PAT }}" ]; then | ||
echo "::error::GH_TOKEN (PAT) is not set" | ||
exit 1 | ||
fi | ||
|
||
- name: Run Repo Fetcher Script | ||
env: | ||
GH_TOKEN: ${{ secrets.PAT }} | ||
run: | | ||
python issue_contributor_fetcher/repo_fetcher/repo_fetcher.py | ||
|
||
- name: Upload CSV Results | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: poc-results | ||
path: issue_contributor_fetcher/repo_fetcher/poc_results.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
import os | ||
import json | ||
import requests | ||
import csv | ||
import logging | ||
import time | ||
|
||
from requests.adapters import HTTPAdapter | ||
from urllib3.util.retry import Retry | ||
|
||
# Build a shared session with retries | ||
def build_session(): | ||
s = requests.Session() | ||
retry = Retry( | ||
total=4, | ||
connect=4, | ||
read=4, | ||
backoff_factor=1, | ||
status_forcelist=[429, 500, 502, 503, 504], | ||
allowed_methods=frozenset(["GET"]), | ||
raise_on_status=False, | ||
) | ||
adapter = HTTPAdapter(max_retries=retry, pool_connections=10, pool_maxsize=10) | ||
s.mount("https://", adapter) | ||
s.headers.update(HEADERS) | ||
return s | ||
|
||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") | ||
|
||
# Script paths | ||
script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
CONFIG_FILE = os.path.join(script_dir, "target_org.json") | ||
OUTPUT_FILE = os.path.join(script_dir, "org_contr.csv") | ||
|
||
# --- GitHub token from secret --- | ||
GITHUB_TOKEN = ( | ||
os.environ.get("PAT") | ||
or os.environ.get("GITHUB_TOKEN") | ||
or os.environ.get("GH_TOKEN") | ||
) | ||
if not GITHUB_TOKEN: | ||
logging.error("GitHub token not found in env (PAT/GITHUB_TOKEN/GH_TOKEN).") | ||
raise SystemExit(1) | ||
|
||
HEADERS = { | ||
"Authorization": f"Bearer {GITHUB_TOKEN}", | ||
"Accept": "application/vnd.github+json", | ||
"X-GitHub-Api-Version": "2022-11-28", | ||
"User-Agent": "org-contributions-fetcher/1.0", | ||
} | ||
|
||
# build session after header is defined | ||
SESSION = build_session() | ||
REQUEST_TIMEOUT = 15 | ||
|
||
# Load configuration | ||
with open(CONFIG_FILE, "r") as f: | ||
config = json.load(f) | ||
|
||
ORG = config.get("org") | ||
USERS = config.get("users", []) | ||
REPOS = config.get("repos", []) | ||
|
||
if not ORG: | ||
logging.error("Org not specified in config.") | ||
exit(1) | ||
|
||
|
||
# --- Helper functions --- | ||
def fetch_repos(org): | ||
"""Fetch all repos for an organization.""" | ||
repos = [] | ||
page = 1 | ||
while True: | ||
url = f"https://api.github.com/orgs/{org}/repos" | ||
params = {"per_page": 100, "page": page} | ||
resp = SESSION.get(url, params=params, timeout=REQUEST_TIMEOUT) | ||
|
||
if resp.status_code != 200: | ||
logging.error(f"Error fetching repos for org {org}: {resp.status_code} {resp.text}") | ||
break | ||
|
||
data = resp.json() | ||
if not data: | ||
break | ||
|
||
repos.extend([r["full_name"] for r in data]) | ||
page += 1 | ||
|
||
logging.info(f"Found {len(repos)} repos in org {org}") | ||
return repos | ||
|
||
def repo_exists(org, repo): | ||
"""Check if a repo exists within the specified organization on GitHub.""" | ||
full_name = f"{org}/{repo}" | ||
url = f"https://api.github.com/repos/{full_name}" | ||
resp = SESSION.get(url, timeout=REQUEST_TIMEOUT) | ||
|
||
if resp.status_code == 200: | ||
return True | ||
elif resp.status_code == 404: | ||
logging.warning(f"Repo '{repo}' does not exist in org '{org}'. Skipping.") | ||
return False | ||
else: | ||
logging.error(f"Error checking repo '{repo}' in org '{org}': {resp.status_code} {resp.text}") | ||
return False | ||
|
||
def fetch_contributions(repo, users=None, max_retries=5): | ||
"""Fetch issues and PRs for specified users in a repo, or all if users is empty.""" | ||
users = users or [] | ||
results = [] | ||
|
||
if not users: | ||
# Fetch all contributions in the repo | ||
users_to_query = [None] | ||
else: | ||
users_to_query = users | ||
|
||
for u in users_to_query: | ||
if u: | ||
logging.info(f"Fetching contributions for user '{u}' in repo '{repo}'") | ||
query = f"repo:{repo} involves:{u}" | ||
else: | ||
logging.info(f"Fetching all contributions in repo '{repo}'") | ||
query = f"repo:{repo}" | ||
|
||
page = 1 | ||
while True: | ||
url = "https://api.github.com/search/issues" | ||
params = {"q": query, "per_page": 100, "page": page} | ||
|
||
for attempt in range(max_retries): | ||
resp = SESSION.get(url, params=params, timeout=REQUEST_TIMEOUT) | ||
remaining = int(resp.headers.get("X-RateLimit-Remaining", 1)) | ||
reset_time = int(resp.headers.get("X-RateLimit-Reset", time.time() + 60)) | ||
|
||
if resp.status_code == 200: | ||
break | ||
elif resp.status_code == 403 and "rate limit" in resp.text.lower(): | ||
wait_seconds = max(reset_time - time.time(), 5) | ||
logging.warning(f"Rate limit hit. Waiting {int(wait_seconds)} seconds...") | ||
time.sleep(wait_seconds) | ||
else: | ||
logging.error(f"Error fetching contributions for {repo}: {resp.status_code} {resp.text}") | ||
else: | ||
logging.error(f"Failed after {max_retries} retries for page {page} in {repo}") | ||
break | ||
|
||
data = resp.json() | ||
items = data.get("items", []) | ||
if not items: | ||
break | ||
|
||
for item in items: | ||
assignees = item.get("assignees", []) | ||
if assignees: | ||
# One row per assigned user | ||
for assignee in assignees: | ||
results.append({ | ||
"user": assignee["login"], | ||
"repo": repo, | ||
"number": item["number"], | ||
"type": "PR" if "pull_request" in item else "Issue" | ||
}) | ||
else: | ||
# If no assignees, use the author of the issue/PR | ||
author = item.get("user", {}).get("login", "UNKNOWN") | ||
results.append({ | ||
"user": author, | ||
"repo": repo, | ||
"number": item["number"], | ||
"type": "PR" if "pull_request" in item else "Issue" | ||
}) | ||
|
||
if "next" not in resp.links: | ||
break | ||
|
||
page += 1 | ||
|
||
return results | ||
|
||
|
||
|
||
def org_fetcher(org, users=None, target_repos=None): | ||
users = users or [] # ensure it's a list even if None | ||
all_results = [] | ||
|
||
if target_repos: | ||
logging.info(f"Fetching contributions for specified repos: {target_repos}") | ||
for repo in target_repos: | ||
full_repo_name = repo if '/' in repo else f"{org}/{repo}" | ||
repo_results = fetch_contributions(full_repo_name, users) | ||
all_results.extend(repo_results) | ||
else: | ||
repos = fetch_repos(org) | ||
for repo in repos: | ||
repo_results = fetch_contributions(repo, users) | ||
all_results.extend(repo_results) | ||
|
||
# Write CSV | ||
os.makedirs(os.path.dirname(OUTPUT_FILE), exist_ok=True) | ||
with open(OUTPUT_FILE, "w", newline="") as f: | ||
fieldnames = ["user", "org", "repo", "number", "type"] | ||
writer = csv.DictWriter(f, fieldnames=fieldnames) | ||
writer.writeheader() | ||
|
||
for item in all_results: | ||
if "/" in item["repo"]: | ||
org_name, repo_name = item["repo"].split("/", 1) | ||
else: | ||
org_name, repo_name = org, item["repo"] | ||
writer.writerow({ | ||
"user": item["user"], | ||
"org": org_name, | ||
"repo": repo_name, | ||
"number": item["number"], | ||
"type": item["type"] | ||
}) | ||
|
||
logging.info(f"Complete. Results written to {OUTPUT_FILE}") | ||
|
||
|
||
if __name__ == "__main__": | ||
org_fetcher(ORG, USERS, REPOS) | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.