Strange Ishizaka
Total Score: 109
Homework submissions
Homework 1: Introduction
Score: 11 = 3 (questions) + 1 (FAQ) + 7 (learning in public)
Homework URL: View submission
Learning in public links: Show
- https://docs.docker.com/compose/
- https://www.capellasolutions.com/blog/vector-databases-vs-elasticsearch-comparing-performance-for-ai-applications#:~:text=Elasticsearch%20uses%20inverted%20indexes%20for,Data%20model%3A%20Documents%20vs%20vectors
- https://www.elastic.co/what-is/vector-search
- https://www.elastic.co/search-labs/blog/large-language-models-elastic-code-langchain
- https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/01-docker-terraform/2_docker_sql
- https://community.openai.com/t/why-cantt-i-log-in-https-platform-openai-com/539877#:~:text=Go%20to%20openai.com%20%2D%20the,taken%20to%20a%20selection%20screen
- https://aws.amazon.com/what-is/retrieval-augmented-generation/#:~:text=Augmented%20Generation%20requirements%3F-,What%20is%20Retrieval%2DAugmented%20Generation%3F,sources%20before%20generating%20a%20response.
Homework 2: Open-Source LLMs
Score: 14 = 6 (questions) + 1 (FAQ) + 7 (learning in public)
Homework URL: View submission
Learning in public links: Show
- https://huggingface.co/blog/gemma
- https://www.promptingguide.ai/models/gemma
- https://www.superteams.ai/blog/steps-to-build-a-rag-pipeline-using-gemma-2b-llm
- https://medium.com/google-developer-experts/fine-tuning-gemma-2b-to-solve-math-problems-ac4921ed531e
- https://github.com/ollama/ollama/issues/1716
- https://ai.google.dev/gemma/docs/model_card
- https://huggingface.co/spaces/optimum/llm-perf-leaderboard
Homework 3: Vector Search
Score: 13 = 5 (questions) + 1 (FAQ) + 7 (learning in public)
Homework URL: View submission
Learning in public links: Show
- https://blog.maximeheckel.com/posts/building-magical-ai-powered-semantic-search/
- https://www.elastic.co/what-is/semantic-search
- https://stackoverflow.com/questions/8224472/how-to-determine-the-accuracy-of-a-semantic-search-engine
- https://www.researchgate.net/figure/Overall-and-conditionalised-on-the-hit-rate-mean-proportions-of-semantic-and-names_tbl2_236332383
- https://www.sciencedirect.com/topics/computer-science/cosine-similarity#:~:text=Cosine%20similarity%20measures%20the%20similarity,document%20similarity%20in%20text%20analysis.
- https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similarity-maths-behind-and-usage-in-python-50ad30aad7db
- https://www.pinecone.io/learn/vector-database/
Homework 4: Evaluation and monitoring
Score: 14 = 6 (questions) + 1 (FAQ) + 7 (learning in public)
Homework URL: View submission
Learning in public links: Show
- https://hyperskill.org/learn/step/29669
- https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499#:~:text=ROUGE%2D1%20recall%20can%20be,number%20of%20unigrams%20in%20R.&text=Then%2C%20ROUGE%2D1%20F1%2D,the%20standard%20F1%2Dscore%20formula.
- https://medium.com/@MUmarAmanat/llm-evaluation-with-rouge-0ebf6cf2aed4
- https://www.pinecone.io/learn/series/vector-databases-in-production-for-busy-engineers/rag-evaluation/
- https://docs.confident-ai.com/docs/guides-rag-evaluation
- https://numpy.org/doc/stable/reference/generated/numpy.average.html
- https://myscale.com/blog/mastering-data-science-cosine-similarity-vs-dot-product-insights/#:~:text=Comparing%20Cosine%20Similarity%20and%20Dot%20Product&text=While%20cosine%20similarity%20focuses%20on,into%20vector%20relationships%20and%20similarities.
Homework 5: Ingestion pipeline
Score: 11 = 6 (questions) + 0 (FAQ) + 5 (learning in public)
Homework URL: View submission
Learning in public links: Show
- https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
- https://developer.nvidia.com/blog/an-easy-introduction-to-multimodal-retrieval-augmented-generation/
- https://blog.meilisearch.com/full-text-search-vs-vector-search/
- https://www.pinecone.io/learn/series/vector-databases-in-production-for-busy-engineers/rag-evaluation/
- https://github.com/mage-ai
Workshop: Open-Source Data Ingestion for RAGs with dlt
Score: 5 = 0 (questions) + 1 (FAQ) + 4 (learning in public)
Homework URL: View submission
Learning in public links: Show
- https://dlthub.com/docs/tutorial/load-data-from-an-api
- https://dlthub.com/docs/getting-started
- https://www.databricks.com/glossary/retrieval-augmented-generation-rag#:~:text=Retrieval%20augmented%20generation%2C%20or%20RAG%2C%20is%20an%20architectural%20approach%20that,applications%20by%20leveraging%20custom%20data.
- https://medium.com/adevinta-tech-blog/data-engineer-2-0-part-ii-retrieval-augmented-generation-47eada9045c9
Project submissions
Project attempt 2
Project score: 16 Passed
Score: 41 = 16 (project) + 9 (peer review) + 14 (learning in public / project) + 1 (learning in public / peer review) + 1 (FAQ)
Project URL: View project
Learning in public links: Show
- https://docs.docker.com/compose/
- https://grafana.com/docs/grafana/latest/panels-visualizations/
- https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-upload-python
- https://www.django-rest-framework.org/topics/documenting-your-api/
- https://www.django-rest-framework.org/topics/documenting-your-api/#third-party-packages-for-openapi-support
- https://www.djangoproject.com/start/
- https://www.elastic.co/search-labs/tutorials/search-tutorial/vector-search/hybrid-search
- https://opster.com/guides/elasticsearch/machine-learning/elasticsearch-hybrid-search/
- https://www.evidentlyai.com/ranking-metrics/ndcg-metric#:~:text=Normalized%20Discounted%20Cumulative%20Gain%20(NDCG)%20is%20a%20ranking%20quality%20metric,DCG%20representing%20a%20perfect%20ranking.
- https://www.evidentlyai.com/ranking-metrics/precision-recall-at-k
- https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr
- https://grafana.com/blog/2024/07/18/a-complete-guide-to-llm-observability-with-opentelemetry-and-grafana-cloud/
- https://docs.nautobot.com/projects/core/en/stable/development/core/docker-compose-advanced-use-cases/
- https://www.evidentlyai.com/llm-guide/llm-as-a-judge